ebi-gene-expression-group / monocle-scripts

Command line scripts wrapping functions in monocle3
8 stars 1 forks source link

account for possible lack of header in gene/cell annotations #8

Closed ktpolanski closed 4 years ago

ktpolanski commented 4 years ago

new syntax successfully imported 10x mtx+tsvs locally

nh3 commented 4 years ago

Could you write a test for importing mtx please? Sufficient to generate tiny test files using eval/printf. Thanks!

pinin4fjords commented 4 years ago

In the process of tweaking https://github.com/ebi-gene-expression-group/monocle-scripts/pull/7

a-solovyev12 commented 4 years ago

I'm afraid I'm still getting errors when trying monocle3 create with the following data: https://www.ebi.ac.uk/~a_solovyev/E-MTAB-6386.expression_tpm.mtx.gz as expression matrix https://www.ebi.ac.uk/~a_solovyev/E-MTAB-6386.expression_tpm.mtx_cols.gz as barcodes https://www.ebi.ac.uk/~a_solovyev/E-MTAB-6386.expression_tpm.mtx_rows.gz as genes

could you please address this?

ktpolanski commented 4 years ago

Decompress the files and add .tsv extensions to the rows/columns matrices.

pinin4fjords commented 4 years ago

I downloaded and decompressed the files, renamed to 10X conventions to be sure. Providing cell and gene labels works- thank you, that addresses our principal use case.

monocle3 create --expression-matrix=matrix.mtx --cell-metadata=barcodes.tsv --gene-annotation=genes.tsv foo.cds3

But FYI omitting cell labels produces an error:

> monocle3 create --expression-matrix=matrix.mtx --gene-annotation=genes.tsv foo.cds3

Error in validObject(.Object) : 
  invalid class “SummarizedExperiment” object: 
    nb of cols in 'assay' (117) must equal nb of rows in 'colData' (0)
Calls: do.call ... new_SummarizedExperiment -> new -> initialize -> initialize -> validObject
Execution halted

Likewise supplying the matrix alone:

> monocle3 create --expression-matrix=matrix.mtx foo.cds3

Error in validObject(.Object) : 
  invalid class “SummarizedExperiment” object: 
    nb of cols in 'assay' (117) must equal nb of rows in 'colData' (0)
Calls: do.call ... new_SummarizedExperiment -> new -> initialize -> initialize -> validObject
In addition: Warning message:
In new_cell_data_set(expression_matrix, cell_metadata = cell_metadata,  :
  Warning: gene_metadata must contain a column verbatim named 'gene_short_name' for certain functions.
Execution halted
a-solovyev12 commented 4 years ago

@ktpolanski, of course, that was implied. I was just pointing to a dataset with which I got an error.

ktpolanski commented 4 years ago

@pinin4fjords noted, reproduced. Leaning towards adding an explanation to the docstring that MTX needs to be supplemented with cell/gene metadata on account of natively lacking that information. Could alternately generate dummy cell/gene names, but this feels rather useless, especially in the gene case. Gene 397 is differentially expressed, now what? I'll wait for nh3's confirmation before proceeding.

pinin4fjords commented 4 years ago

@ktpolanski Seems fair to require the labels for MTX, and I never say no to extra documentation. But I would also suggest actually checking for the labels (e.g. at the point where the MTX extension is detected), and dying gracefully at that point with an informative error message.

ktpolanski commented 4 years ago

Good idea. Added input check and extended docstring. Think that's it.

nh3 commented 4 years ago

I agree with making it fail gracefully behaviour, but I'd argue that command line option dependency is best handled at the stage of command parsing, which is out of scope of this PR. Let's keep the logic within monocle_create() simple and focused and take care of command line option dependency in another PR. In this spirit, I'd suggest partly revert 9135a97 (keep the change to usage messages).

pinin4fjords commented 4 years ago

Hey guys- we kind of need this as a pipeline component. I'm not going to fight you on the nitty gritty of where bad inputs are caught- but could you decide what to do and merge, so we can make a release? Thank you!

nh3 commented 4 years ago

Ok, I've reverted to 03aed54. Given that this addresses our principle use case and bad inputs can be caught even earlier at the galaxy layer, I suggest we merge it and make a quick release, and save checking for option dependency in a future PR.

pinin4fjords commented 4 years ago

Thanks @nh3 . I don't think we should rely on the Galaxy layer (that's not how we're using the package in our current use case). But a later PR to deal with the issues at argument parsing sounds good.

nh3 commented 4 years ago

Thanks!