Hoohm / CITE-seq-Count

A tool that allows to get UMI counts from a single cell protein assay
https://hoohm.github.io/CITE-seq-Count/
MIT License
79 stars 44 forks source link

Different "UMIs corrected" results between v1.4.3 and v1.4.5 #169

Closed rosaxma closed 2 years ago

rosaxma commented 2 years ago

Hello,

I'm running v1.4.5 and v1.4.3 on the exact same input data with the same settings but I'm getting much higher UMI counts from v1.4.5. Is this behavior expected, and would you please explain what has changed since v1.4.3 that leads to more UMI counts? Thank you.

Below are the run reports: Date: 2022-06-15 Running time: 1.0 hour, 36.0 minutes, 14.87 seconds CITE-seq-Count Version: 1.4.5 Reads processed: 98087628 Percentage mapped: 31 Percentage unmapped: 69 Uncorrected cells: 62 Correction: Cell barcodes collapsing threshold: 1 Cell barcodes corrected: 544352 UMI collapsing threshold: 2 UMIs corrected: 6314648 Run parameters: Read1_paths: sample_S1_L001_R1_001.fastq.gz Read2_paths: sample_S1_L001_R2_001.fastq.gz Cell barcode: First position: 1 Last position: 16 UMI barcode: First position: 17 Last position: 28 Expected cells: 18233 Tags max errors: 2 Start trim: 0

Date: 2022-06-16 Running time: 2.0 hours, 10.0 minutes, 44.4 seconds CITE-seq-Count Version: 1.4.3 Reads processed: 98087628 Percentage mapped: 31 Percentage unmapped: 69 Uncorrected cells: 62 Correction: Cell barcodes collapsing threshold: 1 Cell barcodes corrected: 544352 UMI collapsing threshold: 2 UMIs corrected: 6301779 Run parameters: Read1_paths: sample_S1_L001_R1_001.fastq.gz Read2_paths: sample_S1_L001_R2_001.fastq.gz Cell barcode: First position: 1 Last position: 16 UMI barcode: First position: 17 Last position: 28 Expected cells: 18233 Tags max errors: 2 Start trim: 0

Hoohm commented 2 years ago

Hello @rosaxma.

If I recall properly, this is mostly linked to a bug that was miscounting some single or two umis correction. The code counted them as corrections but didn't actually correct any UMIs.

More relevant to your question would be to compare the umi count output. This would be more accurate in terms of consistency. I would guess that the count matrix should be the same. The number shown in the report of 1.4.3 is probably overinflated and the 1.4.5 is properly reporting the number of UMIs corrected.

A scatter plot with a correlation value should give you what you expect.

rosaxma commented 2 years ago

Thank you!