Error with exon end earlier than exon start

GenomeRIK / tama

Transcriptome Annotation by Modular Algorithms (for long read RNA sequencing data)

GNU General Public License v3.0

125 stars 24 forks source link

Error with exon end earlier than exon start #113

Closed ESWRKJ closed 10 months ago

ESWRKJ commented 10 months ago

hello, RIK when I use tama_collapse.py I found Error with exon end earlier than exon start 54827318 54827318 ['m64041_201001_163935/150406967/ccs', 'm64041_201001_163935/99746185/ccs'] It is Pacbio Iso-seq data , please tell me what causes this?

GenomeRIK commented 10 months ago

Hello,

This typically happens when you get a lot of reads from a messy gene where you start having wobble bridging creating a collapse exon start which occurs before the collapse exon end. However, to be double sure it would help to see the SAM lines from those two reads. Could you post those?

Thank you, Richard

ESWRKJ commented 10 months ago

hello, Richard those are two reads information , Could you please assist me in identifying the issue? 150406967.txt 99746185.txt

GenomeRIK commented 10 months ago

What parameters are you using to run TAMA Collapse?

ESWRKJ commented 10 months ago

python /public/home/tama/tama_collapse.py -d merge_dup -x no_cap -b BAM -f /public/home/genome.fasta -s /public/home/Gb.bam -p Gb

GenomeRIK commented 10 months ago

This is quite strange. What mapper did you use and what parameters?

ESWRKJ commented 10 months ago

pbmm2 align --preset ISOSEQ --unmapped --sort -j 6 --log-level INFO --log-file pbmm2.log /public/home/genome.fasta --sample Gb ./Gb.flnc.bam Gb.bam

GenomeRIK commented 10 months ago

The problem seems to be coming from the alignment. You can see from the CIGAR string for read "m64041_201001_163935/99746185/ccs":

2194=2I1172=2767N183=1I132=1D2=510N326I850N254=1D518=202N914=

The issue is here: Exon: 2194=2I1172=
Intron: 2767N Exon: 183=1I132=1D2= Intron: 510N Exon: 326I <-- This exon is just one big insertion which makes no sense. So the mapper has a bug here for some reason. You may want to contact PacBio about this as this is their aligner version. Intron: 850N Exon: 254=1D518= Intron: 202N Exon: 914=

ESWRKJ commented 10 months ago

ok, So if I extract the reads, can the problem be solved?

GenomeRIK commented 10 months ago

In theory yes but if the mapper has a bug then there are likely issues with more mappings so I would not trust it until the bug is resolved. Until they know how this bug works then you cannot know if it is causing silent errors with other transcript models.

ESWRKJ commented 10 months ago

ok I hava a try. Thank you so much!

GenomeRIK commented 10 months ago

Just to clarify are you trying to contact Pacbio about the bug or trying to remove the read info from the file to make it run through TAMA Collapse?

ESWRKJ commented 10 months ago

All. two methods are being carried out simultaneously

GenomeRIK commented 10 months ago

Nice! Very efficient.

ESWRKJ commented 10 months ago

Hi @GenomeRIK when I remove the two reads , the error is disappeared ; Pacbio suggests that I change the mapper to minimap2 ，Using minimap2 is normal. But I see tama_report.txt is difference.

Details

minimap2 results Total Gene Count: 59385 Total Transcript Count: 874323 Total Accepted Reads: 3944979 Total Discarded Reads: 4086923

Details

pbmm2 Total Gene Count: 59485 Total Transcript Count: 876767 Total Accepted Reads: 3965819 Total Discarded Reads: 63775

Discarded Reads Huge difference

GenomeRIK commented 10 months ago

The larger number of discarded reads from minimap is probably because you did not turn off secondary mappings when running minimap. You can check the read.txt file to see what is the reason for read discards.

ESWRKJ commented 10 months ago

yes minimap2 have some secondary mappings

In addition, The gene count and transcript count of the two are slightly different. Does this have any impact?

GenomeRIK commented 10 months ago

There is some impact but it may not make much of a difference to you depending on your objectives. Either way the Pacbio mapper is not working for you.

ESWRKJ commented 10 months ago

OK thank you for your assistance！