MGI-tech-bioinformatics / DNBelab_C_Series_HT_scRNA-analysis-software

An open source and flexible pipeline to analysis high-throughput DNBelab C Series single-cell RNA datasets
MIT License
72 stars 24 forks source link

How Does the Software Determine Beads Originate from the Same Cell? #113

Open lgchen-git opened 2 months ago

lgchen-git commented 2 months ago

Hello,

After running your software, I noticed that some beads were classified as originating from the same cell. Could you please explain how this determination is made? Specifically, I'm curious about the algorithms, software methods, and criteria used to assess whether beads come from the same cell. QQ20240904-102833

Understanding the underlying process would help me better interpret the results and ensure accuracy in my analysis.

Thank you for your help!

lishuangshuang0616 commented 2 months ago

In C4 RNA sequencing, there are two libraries: cDNA and oligo. The cDNA library captures mRNA, while the oligo library binds to the cDNA primers within the same droplet. By analyzing the oligo library data, we can identify different cDNA cell barcodes that are associated with the same oligo cell barcode. Typically, a single droplet contains a few cDNA beads and multiple oligo beads. Beads in the same droplet tend to bind to similar oligos (in terms of type and abundance, following a Poisson distribution). Using this principle, we can calculate the similarity between different cDNA beads within the same droplet and merge the UMI information of the beads that are in the same droplet.

lgchen-git commented 2 months ago

Thank you for your previous explanation!

I now have a question regarding scATAC-seq. Since the ATAC library doesn't use oligo beads, how are barcodes merged to represent a single cell? Could you please clarify the method used to group barcodes and determine individual cells in scATAC-seq?

lishuangshuang0616 commented 2 months ago

The merging of beads in scATAC is completely different from RNA processing. After the open regions of each cell's DNA are cut, amplified, and captured, we assume that the DNA fragments captured by the beads in the same droplet should be identical (i.e., their start and end positions are exactly the same). Therefore, we calculate the proportion of identical fragments captured by each pair of beads to assess the likelihood that they come from the same droplet. The higher the proportion, the more likely they are from the same droplet.

The calculation method is as follows: for each pair of beads, we calculate the ratio of the intersection to the union of the fragments they captured, then sort these ratios in descending order and identify the inflection point in the curve. Values above the inflection point indicate that these beads are more likely to have captured the same fragments, and thus we consider them to come from the same droplet.