Closed ESWRKJ closed 10 months ago
Hello,
This typically happens when you get a lot of reads from a messy gene where you start having wobble bridging creating a collapse exon start which occurs before the collapse exon end. However, to be double sure it would help to see the SAM lines from those two reads. Could you post those?
Thank you, Richard
hello, Richard those are two reads information , Could you please assist me in identifying the issue? 150406967.txt 99746185.txt
What parameters are you using to run TAMA Collapse?
python /public/home/tama/tama_collapse.py -d merge_dup -x no_cap -b BAM -f /public/home/genome.fasta -s /public/home/Gb.bam -p Gb
This is quite strange. What mapper did you use and what parameters?
pbmm2 align --preset ISOSEQ --unmapped --sort -j 6 --log-level INFO --log-file pbmm2.log /public/home/genome.fasta --sample Gb ./Gb.flnc.bam Gb.bam
The problem seems to be coming from the alignment. You can see from the CIGAR string for read "m64041_201001_163935/99746185/ccs":
2194=2I1172=2767N183=1I132=1D2=510N326I850N254=1D518=202N914=
The issue is here:
Exon: 2194=2I1172=
Intron: 2767N
Exon: 183=1I132=1D2=
Intron: 510N
Exon: 326I <-- This exon is just one big insertion which makes no sense. So the mapper has a bug here for some reason. You may want to contact PacBio about this as this is their aligner version.
Intron: 850N
Exon: 254=1D518=
Intron: 202N
Exon: 914=
ok, So if I extract the reads, can the problem be solved?
In theory yes but if the mapper has a bug then there are likely issues with more mappings so I would not trust it until the bug is resolved. Until they know how this bug works then you cannot know if it is causing silent errors with other transcript models.
ok I hava a try. Thank you so much!
Just to clarify are you trying to contact Pacbio about the bug or trying to remove the read info from the file to make it run through TAMA Collapse?
All. two methods are being carried out simultaneously
Nice! Very efficient.
Hi @GenomeRIK when I remove the two reads , the error is disappeared ; Pacbio suggests that I change the mapper to minimap2 ,Using minimap2 is normal. But I see tama_report.txt is difference.
minimap2 results Total Gene Count: 59385 Total Transcript Count: 874323 Total Accepted Reads: 3944979 Total Discarded Reads: 4086923
pbmm2 Total Gene Count: 59485 Total Transcript Count: 876767 Total Accepted Reads: 3965819 Total Discarded Reads: 63775
Discarded Reads Huge difference
The larger number of discarded reads from minimap is probably because you did not turn off secondary mappings when running minimap. You can check the read.txt file to see what is the reason for read discards.
yes minimap2 have some secondary mappings
In addition, The gene count and transcript count of the two are slightly different. Does this have any impact?
There is some impact but it may not make much of a difference to you depending on your objectives. Either way the Pacbio mapper is not working for you.
OK thank you for your assistance!
hello, RIK when I use tama_collapse.py I found Error with exon end earlier than exon start 54827318 54827318 ['m64041_201001_163935/150406967/ccs', 'm64041_201001_163935/99746185/ccs'] It is Pacbio Iso-seq data , please tell me what causes this?