Clinical-Genomics / genmod

Annotate models of genetic inheritance patterns in variant files (vcf files)
http://moonso.github.io/genmod/
MIT License
71 stars 19 forks source link

local variable 'exons' referenced before assignment #3

Closed tzhughes closed 10 years ago

tzhughes commented 10 years ago

Hi,

I finally got round to having a go with your code.

and got

Parsing annotation ...

Traceback (most recent call last):
  File "../bin/arch/genmod-master-0.9.7/scripts/run_genmod.py", line 239, in <module>
    main()
  File "../bin/arch/genmod-master-0.9.7/scripts/run_genmod.py", line 150, in main
    annotation_trees = annotation_parser.AnnotationParser(anno_file, args.annotation_type[0])
  File "build/bdist.macosx-10.8-intel/egg/genmod/utils/annotation_parser.py", line 119, in __init__
UnboundLocalError: local variable 'exons' referenced before assignment

Should I perhaps be working with an older version of the software?

tzhughes commented 10 years ago

@moonso, when running with 0.9.3, which I did before I tried 0.9.7, I was getting past the annotation parsing but running into what seemed like a parsing error with the VCF file. Could you put on github one of your test VCF files so that I can eliminate VCF format issues as a source of error?

../bin/genmod-0.9.3/scripts/run_genmod.py -v -at gtf trio.fam recessive.vcf Homo_sapiens.GRCh37.71_chromosome1only.gtf
Parsing annotation ...

Annotation Parsed!
Time to parse annotation: 0:00:41.757037

Number of CPU:s 4
Start parsing the variants ...

Traceback (most recent call last):
  File "../bin/genmod-0.9.3/scripts/run_genmod.py", line 239, in <module>
    main()
  File "../bin/genmod-0.9.3/scripts/run_genmod.py", line 201, in main
    var_parser.parse()
  File "build/bdist.macosx-10.8-intel/egg/genmod/vcf/vcf_parser.py", line 68, in parse
  File "build/bdist.macosx-10.8-intel/egg/genmod/vcf/vcf_parser.py", line 188, in vcf_variant
KeyError: 'CHROM'
VariantConsumer-2: Starting!
VariantConsumer-3: Starting!
VariantConsumer-5: Starting!
VariantConsumer-4: Starting!
VariantConsumer-6: Starting!
VariantConsumer-7: Starting!
VariantConsumer-8: Starting!
VariantPrinter-9: starting!
Process VariantPrinter-9:
Traceback (most recent call last):
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "build/bdist.macosx-10.8-intel/egg/genmod/utils/variant_printer.py", line 39, in run
    next_result = self.task_queue.get()
  File "<string>", line 2, in get
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/managers.py", line 755, in _callmethod
    self._connect()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/managers.py", line 742, in _connect
    conn = self._Client(self._token.address, authkey=self._authkey)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/connection.py", line 169, in Client
    c = SocketClient(address)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/connection.py", line 289, in SocketClient
    s.connect(address)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 2] No such file or directory
^CError in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
Process VariantConsumer-8:
Traceback (most recent call last):
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
Process VariantConsumer-5:
Traceback (most recent call last):
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "build/bdist.macosx-10.8-intel/egg/genmod/utils/variant_consumer.py", line 148, in run
    func(*targs, **kargs)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/util.py", line 295, in _exit_function
    self.run()
  File "build/bdist.macosx-10.8-intel/egg/genmod/utils/variant_consumer.py", line 148, in run
    p.join()
    next_batch = self.task_queue.get()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/process.py", line 145, in join
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/queues.py", line 115, in get
    next_batch = self.task_queue.get()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/queues.py", line 115, in get
    res = self._popen.wait(timeout)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/forking.py", line 148, in wait
    self._rlock.acquire()
moonso commented 10 years ago

Hello Tim!

I will have a look at this right away.

Måns

tzhughes commented 10 years ago

I got the gtf file form here: http://www.ensembl.org/info/data/ftp/index.html

And my VCF looks like this:

##fileformat=VCFv4.1
##contig=<ID=1,length=249250621,assembly=b37>
##reference=file:///humgen/gsa-hpprojects/GATK/bundle/current/b37/human_g1k_v37.fasta
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  father  mother  proband
1   101 .   A   T   100 PASS    MQ=1    GT:GQ   1/1:60  0/1:60  0/1:60  
moonso commented 10 years ago

Now things should work again.

The first issue was because i will in next big update allow the user to choose if compounds should be valid only for exon variants. This is because of there are some genes with very large introns give rise to a huge number of compound pairs.

The problem with the vcf file that you showed was because it was not tab-separated, this should not be a problem any more.

I've also added a folder with a small test data set as you suggested, good idea!

Hope things will work out for you now,

Regards,

Måns

tzhughes commented 10 years ago

Hi Måns,

Just wanted to shoot in something with regards to your planned update with limiting compounds to exonic regions:

Tim

tzhughes commented 10 years ago

And thanks for the test data :+1:

moonso commented 10 years ago

Yes, it will include at least 2 bases on each side of the exons. Thank you for the thoughts! I will look closer at untranslated regions in exons.

Måns

tzhughes commented 10 years ago

Great! Could maybe be an idea to give users an option as to how far into the exons they wish to go, as some people like to consider just 2bp in the introns, whereas other go quite a few bases deeper into the intron as the splice site motif streaches much further although becomes a lot weaker.

moonso commented 10 years ago

I will defenitely include that option. I will also give support for vcf files that are annotated with vep soon, then the annoation will be determined by VEP and annotation files can be excluded when running GENMOD.

tzhughes commented 10 years ago

Supporting VEP is a good choice. In my opinion, this is the best functional annotator of VCF files.

moonso commented 10 years ago

I agree. After some years with Annovar I will switch to VEP. I think one of the most annoying problems with doing bioinformatics is formats, this is why i want to stay in vcf as long as possible and one of the reasons why i wrote this software.

tzhughes commented 10 years ago

yes, that is my strategy too:

All samples, all annotations, all sites in one VCF file. Then subset what you need and convert to another format if need be.