AlgoLab / PIntron

A novel pipeline for gene-structure prediction based on spliced alignment of transcript sequences (ESTs and mRNAs) against a genomic sequence
http://www.algolab.eu/PIntron
Other
5 stars 6 forks source link

Small external exons are not detected #28

Closed yp closed 12 years ago

yp commented 12 years ago

The alignment of RefSeq NM_014440 (gene IL1F6) does not report the first exon. This is probably due to the fact that the first exon is shorter (10nt) than the minimum factor length (15nt).

PIntron version: v1.2.43 Input files: https://gist.github.com/gists/3760654/download

yp commented 12 years ago

We propose to post-process the alignment as follows. If a prefix of the transcript sequence of length greater than the minimum exon length has not been aligned, then add the longest common factor of the discarded prefix and the prefix of the genomic before the first aligned exon if and only if it induces a canonical intron with the first aligned exon and does not increase the edit distance of a prefix of the first exon.

yp commented 12 years ago

The released fix (PIntron version: v1.2.45) considerably slow down the execution when a long genomic sequence is analyzed (especially if the transcripts map near the end of the genomic sequence).