Let's say that the transcript contains a novel exon which is the result of insertion. This exon has 0 bp aligned to the reference genome since the entire exon sequence is the result of insertion.
In that case, matchAnnot.py returned following error.
Traceback (most recent call last):
File "~/MatchAnnot/matchAnnot.py", line 546, in <module>
main()
File "~/MatchAnnot/matchAnnot.py", line 179, in main
bestTran, bestScore = matchTranscripts (exons, curGene) # match this cluster to all transcripts of gene
File "~/MatchAnnot/matchAnnot.py", line 290, in matchTranscripts
showCoords (readExons, tran)
File "~/MatchAnnot/matchAnnot.py", line 460, in showCoords
printMatchingExons (ixR, ixT, readExons[ixR], tranExons[ixT])
File "~/MatchAnnot/matchAnnot.py", line 491, in printMatchingExons
print ' sub: %2d Q: %4.1f' % (exonR.substs, exonR.QScore()), # comma: line continued in printStartStop
File "~/MatchAnnot/CigarString.py", line 61, in QScore
ZeroDivisionError: float division by zero
If you look into the matchAnnot.txt, it will be like
This is wrong since the second exon of the PacBio read does not locate within the second exon of GENCODE transcript. For the exon 2 of PacBio read, since the size of alignment is 0, we cannot say that the exon 2 of PacBio read matches with the exon 2 of GENCODE transcript.
I modified two functions to fix the problem
1) matchAnnot.py
def findOverlaps (list1, list2):
...
for pos1 in xrange(len(list1)): # for each list1 entry, find all list2 which overlap it
nLen_readexon=list1[pos1].end-list1[pos1].start+1; #added line
if nLen_readexon < 1: #added line
pos1 += 1 #added line
continue; #added line
2) CigarString.py
def QScore (self):
...
if self.substs is None:
Q = 0.0 # no MD string was supplied, can't compute Q score
elif (self.end - self.start + 1)<1: #added line
Q=0.0; #added line
Please let me know if any changes I have made is wrong
Let's say that the transcript contains a novel exon which is the result of insertion. This exon has 0 bp aligned to the reference genome since the entire exon sequence is the result of insertion.
In that case, matchAnnot.py returned following error.
If you look into the matchAnnot.txt, it will be like
This is wrong since the second exon of the PacBio read does not locate within the second exon of GENCODE transcript. For the exon 2 of PacBio read, since the size of alignment is 0, we cannot say that the exon 2 of PacBio read matches with the exon 2 of GENCODE transcript.
The matchAnnot.txt should be like
I modified two functions to fix the problem 1) matchAnnot.py
2) CigarString.py
Please let me know if any changes I have made is wrong