broadinstitute / CellBender

CellBender is a software package for eliminating technical artifacts from high-throughput single-cell RNA sequencing (scRNA-seq) data.
https://cellbender.rtfd.io
BSD 3-Clause "New" or "Revised" License
285 stars 52 forks source link

Error handling MTX format inputs in WDL #245

Closed sjfleming closed 6 months ago

sjfleming commented 1 year ago

As @jamesnemesh pointed out to me, if you input MTX data to the WDL, together withgenes_file and barcodes_file, you get this error:

ValueError: Failed to determine input file type for /raw_matrix/matrix.mtx.gz
This must either be: a directory that contains CellRanger-format MTX outputs; a single CellRanger ".h5" file; a DropSeq-format DGE ".txt.gz" file; a BD-Rhapsody-format ".csv" file; a ".h5ad" file produced by anndata (include all barcodes); a ".loom" file (include all barcodes); or a ".npz" sparse matrix file

While I guess technically this error message is correct... the real problem is that the WDL is not properly constructing a directory as the input.

(There is a strange asymmetry present in how I am handing MTX and NPZ files: both need auxiliary genes and barcodes files, but for the NPZ, I want --input to be the NPZ file, while for an MTX, I want --input to be the directory. This is a bit strange.)

For now, the quickest way to fix this is to table the issue of handling MTX and NPZ differently, and just fix the WDL so that it gives the directory as --input to remove-background when the WDL's input_file_unfiltered is an MTX file.