PacificBiosciences / pbbioconda

PacBio Secondary Analysis Tools on Bioconda. Contains list of PacBio packages available via conda.
BSD 3-Clause Clear License
243 stars 44 forks source link

isoseq for scRNA: duplicate molecules are kept #691

Closed JiangyanYu closed 1 month ago

JiangyanYu commented 2 months ago

Operating system Linux

Package name smrtlinks v13

Conda environment not used for smrtlinks installation

Describe the bug The molecules with same UMI and same CB are found in the final sample.annotated.info.csv. These molecules formed PB isoform are present in genes.tsv file in the isoform_seurat folder, indicating these isoforms are considered to the a real isoform.

Error message Not program error

To Reproduce for example, this isoform contains four molecules (from the annotated.infor.csv), which are with same CB and UMI.

molecule/29890232    PB.203740.724    1497    ENST00000320746.6    ADAM20P1    full-splice_match    NA    NA    TCCCCTATATTC    GAATATAGGGGA    TCTGTATACGAATTGA    TCAATTCGTATACAGA    PASS
molecule/29890233    PB.203740.724    1497    ENST00000320746.6    ADAM20P1    full-splice_match    NA    NA    TCCCCTATATTC    GAATATAGGGGA    TCTGTATACGAATTGA    TCAATTCGTATACAGA    PASS
molecule/29890234    PB.203740.724    1497    ENST00000320746.6    ADAM20P1    full-splice_match    NA    NA    TCCCCTATATTC    GAATATAGGGGA    TCTGTATACGAATTGA    TCAATTCGTATACAGA    PASS
molecule/29890242    PB.203740.724    1497    ENST00000320746.6    ADAM20P1    full-splice_match    NA    NA    TCCCCTATATTC    GAATATAGGGGA    TCTGTATACGAATTGA    TCAATTCGTATACAGA    PASS

Expected behavior Should these duplicated molecules be removed? I am not sure whether this is a bug, or not. If not, can you please explain what is the reason? Thanks a lot in advance!

armintoepfer commented 1 month ago

This is not a mechanical software problem. Please refer to support@pacb.com