MonashBioinformaticsPlatform / RSeQC

fork of RSeQC python RNAseq metrics suit of tools
Other
46 stars 18 forks source link

junction_saturation not suit for bam/sam file generated by minimap or pbmm2 #4

Closed wuzengding closed 2 years ago

wuzengding commented 2 years ago

because the CIGAR in bam/sam file generated by minimap2 contain "=" , represent right match with reference, and "X", represent wrong match with reference. while the bam_cigar.py in ./lib/qcmodule/bam_cigar.py only suit for bam/sam generated such as BWA/bowtie, which CIGAR contain only "M" ,represent mis/match. So i modified the bam_cigar.py

77 def fetch_intron(chrom, st, cigar): 78 ''' fetch intron regions defined by cigar. st must be zero based 79 return list of tuple of (chrom,st, end) 80 ''' 81 #match = re.compile(r'(\d+)(\D)') 82 chrom_st = st 83 intron_bound =[] 84 for c,s in cigar: #code and size 85 if c==0: #match 86 chrom_st += s 87 elif c==1: #insertion to ref 88 continue 89 elif c==2: #deletion to ref 90 chrom_st += s 91 elif c==3: #gap or intron 92 intron_bound.append((chrom, chrom_st,chrom_st+s)) 93 chrom_st += s 94 elif c==4: #soft clipping. We do NOT include soft clip as part of exon 95 #chrom_st += s 96 continue 97 elif c==7: 98 chrom_st += s 99 elif c==8: 100 chrom_st += s 101 else: 102 continue 103 return intron_bound

wuzengding commented 2 years ago

image