gaolabtools / scNanoGPS

Single cell Nanopore sequencing data for Genotype and Phenotype
Other
39 stars 2 forks source link

difference between the number of sum of UMI counts in CB_counting.tsv and the numbers of result counting in scanner.log.txt #33

Closed myanofgintmd closed 5 months ago

myanofgintmd commented 6 months ago

Hi Cheng-kai,

I have a question about difference between the numbers of "Result counting" in scanner.log.txt and the number of sum of all UMI counts in CB_counting.tsv.

In one of my project;

The "Result counting" in scanner.log txt is as below Number of 3'-adaptor located on the read head region: 1,850,644 Number of 3'-adaptor + polyT on the read head region: 1,815,982 Number of 3'-adaptor located on the read tail region: 1,757,142 Number of 3'-adaptor + polyT on the read tail region: 1,732,253 I regard the sum of the number of 3'-adaptor + polyT on either region is reasonable as reads which will be used in the next step, Assigner. This number is 3,548,235 (= 1,815,982 + 1,732,253)

The number of sum of all UMI counts in CB_counting.tsv is 2,824,755

Why the two numbers, 2,824,755 and 3,548,235, are different?, and what kind of stracture difference are in the reads?

shiauck commented 6 months ago

Hi Minoru,

The CB_counting.tsv is generated by Assigner. Assigner is designed to detect true cell barcodes, excluding ambient or debris.

In your data, 3,548,235 reads contain 3'-adaptor (TruSeq R1) + polyT from both true CB and ambient. 2,824,755 reads are from detected true CB.

Hope this helps.

Regards, Cheng-Kai

myanofgintmd commented 6 months ago

Hi Cheng-Kai, Thank you for the prompt respose to my question. I am sorry but I have further questions on your comments in the response.

Could you explain what each of the words, true CB, ambient, debris, means?
Especially reagrding "true CB", I would like to know the difinition of the "true" under the situation that there are no white CB list in scNanoGPS.

If I can see the meanings of these words, I would be able to understand whay the number of reads decrease from Scanner result to Assigner result.

Best regards, Cheng-Kai

shiauck commented 6 months ago

Hi Minoru,

Please check this post "Introduction to Ambient RNA Correction" from 10X Genomics. This post explains the ambient and debris.

"True cell barcode" mentioned in our paper are the cell barcodes supported by sufficient amount of reads / UMIs. With a sufficient amount of supporting reads, statistically, it could only happen with an extreme low probability that the cell barcode is come from a repertoire of all kinds of errors.

Regards, Cheng-Kai

myanofgintmd commented 6 months ago

Thanks, Cheng-Kai !! I will check on the post from 10X Genomics and the mention about true cell barcode in your paper. and I will get back here soon. Minoru