Closed yp closed 12 years ago
We propose to post-process the alignment as follows. If a prefix of the transcript sequence of length greater than the minimum exon length has not been aligned, then add the longest common factor of the discarded prefix and the prefix of the genomic before the first aligned exon if and only if it induces a canonical intron with the first aligned exon and does not increase the edit distance of a prefix of the first exon.
The released fix (PIntron version: v1.2.45
) considerably slow down the execution when a long genomic sequence is analyzed (especially if the transcripts map near the end of the genomic sequence).
The alignment of RefSeq NM_014440 (gene IL1F6) does not report the first exon. This is probably due to the fact that the first exon is shorter (10nt) than the minimum factor length (15nt).
PIntron version:
v1.2.43
Input files: https://gist.github.com/gists/3760654/download