Xinglab / espresso

Other
48 stars 4 forks source link

how to define full-length transcript #58

Open junjiemama opened 3 weeks ago

junjiemama commented 3 weeks ago

In the algorithm, there is a step of telling the full-length transcripts apart from non full-length transcripts. May I ask what kind of the criteria the pipeline used for this process? Thank you!

EricKutschera commented 3 weeks ago

The sequence of splice junctions in each aligned read is checked against the splice junctions of annotated isoforms. If the junctions for the read are the same as the junctions in an annotated isoform then the read is a full-length transcript for an annotated isoform. If an alignment has a sequence of junctions that would match an annotated isoform, but the alignment is missing one or more junctions at the beginning or end, then that alignment could be a non full-length transcript for an annotated isoform

For alignments with novel junctions there is a similar check to see if the sequence of junctions would match another novel alignment but with one or more junctions missing at the beginning or end. The novel alignments which do not match to any longer splice junction sequence are considered full-length transcripts. The alignments which are missing some junctions are non full-length

junjiemama commented 3 weeks ago

Thank you for your reply!