AlexsLemonade / scpca-nf

scpca-nf is the Nextflow workflow for processing Single-cell Pediatric Cancer Atlas Portal data
BSD 3-Clause "New" or "Revised" License
12 stars 2 forks source link

[BUG] Account for grabbing estimated demux cell counts for libraries with no genetic demultiplexing #740

Closed allyhawkins closed 6 months ago

allyhawkins commented 6 months ago

Describe the bug

When looking into our test data more carefully, I noticed that we have an empty array for sample_cell_estimates in the final metadata.json for all test multiplexed libraries. Instead there should be an array with a sample count for each sample in the multiplexed library.

These counts are estimated in sce_qc_report.R by grabbing the cell counts for one of the demux methods. https://github.com/AlexsLemonade/scpca-nf/blob/edc3c2680235cb92d15ebd751abe021e8fbe5574/bin/sce_qc_report.R#L235-L250

However, the default demux_method is set to vireo in this script and we don't actually provide that argument in the module that runs this script. So for every library, regardless of if genetic demultiplexing was run or not, it's going to grab the demux counts from the column output by vireo. This isn't an issue with any of our current samples on the Portal, but will be an issue if we ever have samples that are multiplexed and don't have matching bulk or if a user is running without genetic demultiplexing. This also affects the test data, since we don't run genetic demultiplexing on those samples, so we probably want to fix it.

Proposed solution

I think we should add a key to the meta object indicating if genetic demux was used or not. If it's not used then we want to provide a different demux_method (either HashedDrops or HTODemux) to the script. Otherwise, vireo is always used.

allyhawkins commented 6 months ago

Closed by #742