Closed ktpolanski closed 4 years ago
Could you write a test for importing mtx please? Sufficient to generate tiny test files using eval/printf. Thanks!
In the process of tweaking https://github.com/ebi-gene-expression-group/monocle-scripts/pull/7
I'm afraid I'm still getting errors when trying monocle3 create
with the following data:
https://www.ebi.ac.uk/~a_solovyev/E-MTAB-6386.expression_tpm.mtx.gz as expression matrix
https://www.ebi.ac.uk/~a_solovyev/E-MTAB-6386.expression_tpm.mtx_cols.gz as barcodes
https://www.ebi.ac.uk/~a_solovyev/E-MTAB-6386.expression_tpm.mtx_rows.gz as genes
could you please address this?
Decompress the files and add .tsv
extensions to the rows/columns matrices.
I downloaded and decompressed the files, renamed to 10X conventions to be sure. Providing cell and gene labels works- thank you, that addresses our principal use case.
monocle3 create --expression-matrix=matrix.mtx --cell-metadata=barcodes.tsv --gene-annotation=genes.tsv foo.cds3
But FYI omitting cell labels produces an error:
> monocle3 create --expression-matrix=matrix.mtx --gene-annotation=genes.tsv foo.cds3
Error in validObject(.Object) :
invalid class “SummarizedExperiment” object:
nb of cols in 'assay' (117) must equal nb of rows in 'colData' (0)
Calls: do.call ... new_SummarizedExperiment -> new -> initialize -> initialize -> validObject
Execution halted
Likewise supplying the matrix alone:
> monocle3 create --expression-matrix=matrix.mtx foo.cds3
Error in validObject(.Object) :
invalid class “SummarizedExperiment” object:
nb of cols in 'assay' (117) must equal nb of rows in 'colData' (0)
Calls: do.call ... new_SummarizedExperiment -> new -> initialize -> initialize -> validObject
In addition: Warning message:
In new_cell_data_set(expression_matrix, cell_metadata = cell_metadata, :
Warning: gene_metadata must contain a column verbatim named 'gene_short_name' for certain functions.
Execution halted
@ktpolanski, of course, that was implied. I was just pointing to a dataset with which I got an error.
@pinin4fjords noted, reproduced. Leaning towards adding an explanation to the docstring that MTX needs to be supplemented with cell/gene metadata on account of natively lacking that information. Could alternately generate dummy cell/gene names, but this feels rather useless, especially in the gene case. Gene 397 is differentially expressed, now what? I'll wait for nh3's confirmation before proceeding.
@ktpolanski Seems fair to require the labels for MTX, and I never say no to extra documentation. But I would also suggest actually checking for the labels (e.g. at the point where the MTX extension is detected), and dying gracefully at that point with an informative error message.
Good idea. Added input check and extended docstring. Think that's it.
I agree with making it fail gracefully behaviour, but I'd argue that command line option dependency is best handled at the stage of command parsing, which is out of scope of this PR. Let's keep the logic within monocle_create() simple and focused and take care of command line option dependency in another PR. In this spirit, I'd suggest partly revert 9135a97 (keep the change to usage messages).
Hey guys- we kind of need this as a pipeline component. I'm not going to fight you on the nitty gritty of where bad inputs are caught- but could you decide what to do and merge, so we can make a release? Thank you!
Ok, I've reverted to 03aed54. Given that this addresses our principle use case and bad inputs can be caught even earlier at the galaxy layer, I suggest we merge it and make a quick release, and save checking for option dependency in a future PR.
Thanks @nh3 . I don't think we should rely on the Galaxy layer (that's not how we're using the package in our current use case). But a later PR to deal with the issues at argument parsing sounds good.
Thanks!
new syntax successfully imported 10x mtx+tsvs locally