TomSkelly / MatchAnnot

Python scripts for matching output of Pacific Biosciences IsoSeq RNA-seq pipeline to an annotation file.
GNU General Public License v3.0
24 stars 13 forks source link

MatchAnnot does not seem to support annotations of pseudogene or lincRNAs #7

Open slowsmile opened 9 years ago

slowsmile commented 9 years ago

Dear TomSkelly I am a researcher in Mt Sinai medical center in New York. Recently we are using MatchAnnot to annotate PacBio outputs aligned by Gmap. (SAM/BAM) Our goal is to do profile on lincRNAs, however, I realized that MatchAnnot seem to avoid assigning reads to pseudogene or lincRNAs. In the output of MatchAnnot, we performed some overlap analysis, only to find that our sample has 0 lincRNAs, which is not the way we prepared the bam input. Is it true that MatchAnnot really only annotate bam file with protein coding features in the gtf file? If so, is there a way in MatchAnnot to turn on an option to have it work with lincRNA and pseudogenes? I am looking at the result section of MatchAnnot output. Thanks

TomSkelly commented 9 years ago

I' m surprised by this. Matchannot should match to anything with a gene entry in the gt f.

Try '--format alt'

I am in hospital, can have a look when I get out.

slowsmile commented 9 years ago

Thanks Tom, hope everything goes well and feel better. MatchAnnot seems to work with --format alt, could you provide a bit more information regarding this --format alt? How does it differ from --format standard or pickle? I looked into gtf format specifications but couldn't find an explanation regarding 'alt' or 'pickle'