churchill-lab / emase

Expectation-Maximization algorithm for Allele-Specific Expression
http://churchill-lab.github.io/emase/
GNU General Public License v3.0
21 stars 13 forks source link

GTF file format #2

Open bdeonovic opened 9 years ago

bdeonovic commented 9 years ago

I'm getting the following error when trying to run prepare-emase

Parsing refFlat_20150603.gtf...
Traceback (most recent call last):
  File "/Users/bdeonovic/miniconda/envs/emase/bin/prepare-emase", line 4, in <module>
    __import__('pkg_resources').run_script('emase==0.9.5', 'prepare-emase')
  File "build/bdist.linux-x86_64/egg/pkg_resources/__init__.py", line 735, in run_script
  File "build/bdist.linux-x86_64/egg/pkg_resources/__init__.py", line 1652, in run_script
  File "/Users/bdeonovic/miniconda/envs/emase/lib/python2.7/site-packages/emase-0.9.5-py2.7.egg/EGG-INFO/scripts/prepare-emase", line 279, in <module>
    sys.exit(main())
  File "/Users/bdeonovic/miniconda/envs/emase/lib/python2.7/site-packages/emase-0.9.5-py2.7.egg/EGG-INFO/scripts/prepare-emase", line 228, in main
    gdb, tdb = parse_gtf(gtffile)
  File "/Users/bdeonovic/miniconda/envs/emase/lib/python2.7/site-packages/emase-0.9.5-py2.7.egg/EGG-INFO/scripts/prepare-emase", line 131, in parse_gtf
    tdb[tid][feature].append((s, e))
KeyError: 'start_codon'

It looks like my GTF is not of the proper form. Here are the first few lines:

chr22   refFlat exon    16590758    16592810    .   -   .   gene_id "CCT8L2"; transcript_id "NM_014406"; exon_number "1"; exon_id "NM_014406.1"; gene_name "CCT8L2";
chr22   refFlat CDS 16590880    16592550    .   -   0   gene_id "CCT8L2"; transcript_id "NM_014406"; exon_number "1"; exon_id "NM_014406.1"; gene_name "CCT8L2";
chr22   refFlat start_codon 16592548    16592550    .   -   0   gene_id "CCT8L2"; transcript_id "NM_014406"; exon_number "1"; exon_id "NM_014406.1"; gene_name "CCT8L2";
chr22   refFlat stop_codon  16590877    16590879    .   -   0   gene_id "CCT8L2"; transcript_id "NM_014406"; exon_number "1"; exon_id "NM_014406.1"; gene_name "CCT8L2";

If this is not the proper format that your program is expecting please let me know what format the file should be in (an example of the first few lines of a GTF you use would be helpful)

narayananr commented 9 years ago

I think GTF is the issue. We have tested with Ensembl GTF format extensively (upto release-68 (ftp://ftp.ensembl.org/pub/release-68/gtf/mus_musculus)). What GTF file did you use? Can you try using ensembl?

Thanks

GlastonburyC commented 8 years ago

I'm getting the same error. I'm using Homo_sapiens.GRCh37.68.gtf. And prepare-emase is giving me the following error:

KeyError: 'start_codon'

narayananr commented 8 years ago

Thanks for trying out EMASE. Let me try to reproduce the error and get back to you.

narayananr commented 8 years ago

Hi

Is there any specific reason you are interested in using Homo_sapiens.GRCh37.68.gtf? The annotation is from 2012. I would suggest to use the newer version. prepare-emase works fine for the last annotation Homo_sapiens.GRCh37.75.gtf of the GRC37 build.

Thanks Narayanan

27NRussell commented 6 years ago

Hi,

I am trying to run emase and am getting the same error. I get

File "/uufs/chpc.utah.edu/sys/installdir/anaconda/5.3.0/envs/emase/lib/python2.7/site-packages/emase-0.10.16-py2.7.egg-info/scripts/prepare-emase", line 132, in parse_gtf
    tdb[tid][feature].append((s, e))
KeyError: 'start_codon'

Previously you said it was possibly the gtf file, however, I am using the exact gtf file (ftp://ftp.ensembl.org/pub/release-68/gtf/mus_musculus) that you said you have tested previously. I am also using the corresponding genome.

Any advice as to how I can fix this would be greatly appreciated.

Thanks so much,

Nikki