UCSF-DSCOLAB / data_processing_pipelines

A repository to store the existing pipelines to process the various CoLabs datasets
0 stars 1 forks source link

Improve test cases for single cell #48

Open amadeovezz opened 10 months ago

amadeovezz commented 10 months ago

Currently the test cases are quite rudimentary in nature. They simply check that the quantiles_pre.tsv file exists for each library. This is surprisingly enough of a test harness to catch a lot of channel manipulation bugs and assert that the pipeline runs end to end.

However, it would be even better if we could actually check specific quantifiable outputs of the pipeline, such as number of droplets, etc...

amadeovezz commented 10 months ago

@erflynn it would be great to have your input of what exactly would be useful to assert!

erflynn commented 10 months ago

A couple ideas: [Cellranger produces a raw_feature_bc_matrix/barcodes.tsv.gz of invariant length - I dont think we need to test this?] Freemuxlet, Demuxlet: number of doublets, singlets, ambiguous in *samples.tsv. For freemuxlet - there exists a .vcf.gz. DoubletFinder: number of doublets estimated, doublets added to the metadata for the seurat object outputted Seurat_QC: number of cells remaining, figures produced, adds appropriate columns to the seurat object

Freemuxlet/demuxlet merge: ".plp" output files present in the appropriate directory