CGATOxford / UMI-tools

Tools for handling Unique Molecular Identifiers in NGS data sets
MIT License
481 stars 190 forks source link

comparison with alevin. #504

Closed mortunco closed 2 years ago

mortunco commented 2 years ago

I asked the same question in COMBINE-lab Gitter chat. Sorry for the duplication.

Hello,

Umi_tools recommends alevin for processing chromium data. I have V3 based single cell data. I am trying to use both softwares for only cellB correction. I observed a 379 CBs when I ran alevin with (noQuant) compared to 169 CBs from umi_tools (default settings, V3 bc pattern). I know this is a general question but I was wondering how should I attack this problem to figure out correct set of CBs.

### Alevin 
./salmon-1.6.0_linux_x86_64/bin/salmon alevin -l ISR --index capture/ --chromiumV3 -p 16 --tgMap tx2gene  --dumpfq --noQuant -1 R1.fastq.gz -2 R2.fastq.gz -o default-settings > default-settings.fastq
### umitools whitelist 
umi_tools whitelist --stdin ../NL2_CKDL210021281-1a-SI_GA_A2_HMHFJDSX2_S3_L004_R1_001.fastq.gz --bc-pattern=CCCCCCCCCCCCCCCCNNNNNNNNNNNN --log2stderr > umitools/default-whitelist

Thank you very much for help,

Best regards,

Tunc.

TomSmithCGAT commented 2 years ago

Hi Tunc.

This is a tricky question, and frankly, one that others may be better placed to answer. You'll likely also get different answers from cellranger, sircel or any other available tool.

The two tools take different approaches to determine the best set of CBs.

Personally, having implemented both approaches, I favour the alevin method. But I would want to do further QC on the CBs regardless. For example, using EmptyDrops to ensure you don't have CBs which are overly contaminated with background ambient RNA.

mortunco commented 2 years ago

Thanks Tom for explaining both method clearly. I think I need some time digest this information and implement empty drops to our QC. We are working on single cell implementation of multi parallel reporter assays so need to tweak our data for empty drops.

I am closing the issue now. to whom has some suggestion please feel free re-open and comment. I would appreciate any idea!

Best,

Tunc.