chess-genome / chess

Comprehensive Human Expressed SequenceS
http://ccb.jhu.edu/chess/
GNU General Public License v3.0
15 stars 2 forks source link

CHESS3: Which method did you use to filter only coincident transcripts from the two obtained annotations? #7

Closed pclavell closed 1 year ago

pclavell commented 1 year ago

In your CHESS3 preprint methods section, in an effort to reduce transcriptional noise/artifacts you state "We only kept transcripts that were assembled in the initial samples, as well as after aggregating the alignments with TieBrush." I'd like to ask which method did you use to merge only coincident transcripts, as I am only finding tools to merge full gtf files.

Thanks a lot

alevar commented 1 year ago

Hi,

To find transcripts models that were observed both in raw assemblies as well as after TieBrush filtering, we used gffcompare (https://github.com/gpertea/gffcompare). Gffcompare can be used to accomplish this task in several ways. For every query transcript, gffcompare will return a classification code (reported in the output GTF file as well as the tmap file).

The full list of classification codes can be found in the gffcompare documentation at https://ccb.jhu.edu/software/stringtie/gffcompare.shtml. For example classification code "=" indicates that query transcript matches reference transcript.

By comparing GTEx-assembled transcripts (query) to TieBrush-assembled transcripts (reference) and selecting only those that contain "=" code we selected only coincident transcripts. The only issue is that this technique does not work as well for single-exon transcripts, which we treated separately in our work.

Hope this information answers your question. Please let me know if I may be of assistance with anything else!

Thank you,

Ales

pclavell commented 1 year ago

Hi, yes, this is it. Thanks a lot!