Open cafelton opened 2 months ago
Current protocol for getting FLAIR correct to best correct for splice sites after aligning with minimap2 -un: Parse reference gtf to reference splice sites Go through long read aligned reads bed file, get intron positions, make dict intron to read support Check proximity of intron SSs to reference SSs, if any position in read intron is <= 3 bp from reference, correct to reference position Optional: separate introns into those with >=3 read support, and those with < 3 read support, for low support introns, if they are within 3bp of good intron, add low support reads to support of good intron Output splice junction file, feed this into correct step as short-read splice junctions
Warning: I have only tested this with high quality long reads (Pacbio or R2C2 + Nanopore), I don't know if this would work with less accurate reads
I have read the paper (https://doi.org/10.1038/s41467-020-15171-6)
and the manual (https://flair.readthedocs.io/en/latest/) and I still have a question about
FLAIR throws out reads with noncanonical splice sites in the collapse step. This is due to minimap2 misalignment near the splice site in the align step, where it introduces a deletion. FLAIR collapse (with --check_splice) then throws out reads to those splice sites due to alignments then not being good enough.
I have solved this by aligning with minimap2 -un, although this introduces errors at other splice sites that are sometimes not resolved by FLAIR correct. Would be good to adapt FLAIR align + correct to better account for noncanconical splice sites