v1.5.0 resulting in much lower umi counts than v1.4.4 with same settings..?

Erinke89 commented 3 years ago

Dear @Hoohm , I'm sorry for opening yet another issue... I know the v1.5.0 is still under development, so hopefully my feedback can be useful. Although this problem may sound a bit vague. I've tried running a single dataset through both v1.4.4 and v1.5.0 with the same settings, but it gives very different results. After the 1.5 run the large majority of cells have a raw umi count of zero for all 6 CITEseq-tags, whereas for the v1.4 run the raw umi counts are overall much higher and definitely not zero. Notably, the number of corrected UMIs is very different when comparing the run reports (below). I don't understand what causes this difference? The 1.4.4 results fit much better with what I would expect from this experiment.

CITE-seq-Count Version: 1.4.4 Reads processed: 79926157 Percentage mapped: 96 Percentage unmapped: 4 Uncorrected cells: 0 Correction: Cell barcodes collapsing threshold: 1 Cell barcodes corrected: 149555 UMI collapsing threshold: 1 UMIs corrected: 372793 Run parameters: Read1_paths: data.dir/816820_AF10_S2_L001_R1_001.fastq.gz,data.dir/804111_AF10_S2_L002_R1_001.fastq.gz Read2_paths: data.dir/816820_AF10_S2_L001_R2_001.fastq.gz,data.dir/804111_AF10_S2_L002_R2_001.fastq.gz Cell barcode: First position: 1 Last position: 16 UMI barcode: First position: 17 Last position: 28 Expected cells: 10000 Tags max errors: 2 Start trim: 0

CITE-seq-Count Version: 1.5.0 Reads processed: 79926157 Percentage mapped: 96 Percentage unmapped: 4 Percentage too short: 0 R1_too_short: 0 R2_too_short: 0 Uncorrected cells: 0 Correction: Cell barcodes collapsing threshold: 1 Cell barcodes corrected: 145408 UMI collapsing threshold: 1 UMIs corrected: 72369 Run parameters: Read1_paths: data.dir/816820_AF10_S2_L001_R1_001.fastq.gz,data.dir/804111_AF10_S2_L002_R1_001.fastq.gz Read2_paths: data.dir/816820_AF10_S2_L001_R2_001.fastq.gz,data.dir/804111_AF10_S2_L002_R2_001.fastq.gz Cell barcode: First position: 1 Last position: 16 UMI barcode: First position: 17 Last position: 28 Expected cells: 10000 Tags max errors: 2 Start trim: 0

Hoohm commented 3 years ago

Hello @Erinke89 ! Thank you very much for all these tests.

Lower umi correction is expected as 1.4.* were "correcting" the unmapped feature, which doesn't really make sense.

Lower umi counts is not expected and a bit worrisome. Are you using the same whitelist/top_cells arguments?

Hoohm commented 3 years ago

And more specifically, are you using a 10xV3 chemistry?

Erinke89 commented 3 years ago

Thank you for the quick reply! Sorry, this problem actually seems to occur only with a slightly older version of 1.5.0... after your latest commit that also fixed my previous issue with the "translation" column in the whitelist being a requirement instead of being optional, the umi counts are very similar to what v1.4.4 gives me. Sorry that I muddled this up!!

Not sure what caused the difference I observed (I was using same number of expected cells and same whitelist file) but it does seem linked to the difference in number of umis corrected, as after the latest 1.5.0 updates this number also went up to almost match the number from the v1.4.4 run.

Hoohm commented 3 years ago

I'm closing this since it seems to be fixed.

Hoohm / CITE-seq-Count

v1.5.0 resulting in much lower umi counts than v1.4.4 with same settings..? #147