biocompibens / ALFA

ALFA: Annotation Landscape for Aligned Reads
MIT License
14 stars 2 forks source link

Error in parsing gtf file #9

Open mzavolan opened 2 years ago

mzavolan commented 2 years ago

Hello

I am trying to use ALFA on a yeast data set. I downloaded a gtf file from ENSEMBL, Saccharomyces_cerevisiae.R64-1-1.104.gtf, but I get a KeyError exception:

if "exon" in prev_features_minus[biotype]:                                                                                                                                                         

KeyError: 'snoRNA'

It doesn't seem to be an issue specific to snoRNAs, because if remove the snoRNA features from the gtf file the exception is raised for another annotation.

It seems that ALFA was run on yeast before, without raising such error. Any suggestions?

Thank you! Mihaela

uniqueg commented 2 years ago

You can get the file from here: http://ftp.ensembl.org/pub/release-104/gtf/saccharomyces_cerevisiae/Saccharomyces_cerevisiae.R64-1-1.104.gtf.gz

uniqueg commented 2 years ago

I could reproduce the error with ALFA v1.1.1 from Conda with:

alfa -a Saccharomyces_cerevisiae.R64-1-1.104.gtf

However, after sorting with:

sort -k1,1 -k4,4n -k5,5nr GTF_FILE > SORTED_GTF_FILE

it works.

So it appears that the Ensembl yeast annotations, unlike those, e.g., for human and mouse, are not in the expected sort order, causing index builds to fail.

However, I think it would be nice if ALFA caught that and reported a more informative error message.

mzavolan commented 2 years ago

Thanks, it does work!

mbahin commented 2 years ago

Hi,

Thanks for your questions and solutions! :) I'm really sorry but have currently no time anymore to work on this. I hope the tools still helps...

Cheers, Mathieu