Closed tzhughes closed 10 years ago
@moonso, when running with 0.9.3, which I did before I tried 0.9.7, I was getting past the annotation parsing but running into what seemed like a parsing error with the VCF file. Could you put on github one of your test VCF files so that I can eliminate VCF format issues as a source of error?
../bin/genmod-0.9.3/scripts/run_genmod.py -v -at gtf trio.fam recessive.vcf Homo_sapiens.GRCh37.71_chromosome1only.gtf
Parsing annotation ...
Annotation Parsed!
Time to parse annotation: 0:00:41.757037
Number of CPU:s 4
Start parsing the variants ...
Traceback (most recent call last):
File "../bin/genmod-0.9.3/scripts/run_genmod.py", line 239, in <module>
main()
File "../bin/genmod-0.9.3/scripts/run_genmod.py", line 201, in main
var_parser.parse()
File "build/bdist.macosx-10.8-intel/egg/genmod/vcf/vcf_parser.py", line 68, in parse
File "build/bdist.macosx-10.8-intel/egg/genmod/vcf/vcf_parser.py", line 188, in vcf_variant
KeyError: 'CHROM'
VariantConsumer-2: Starting!
VariantConsumer-3: Starting!
VariantConsumer-5: Starting!
VariantConsumer-4: Starting!
VariantConsumer-6: Starting!
VariantConsumer-7: Starting!
VariantConsumer-8: Starting!
VariantPrinter-9: starting!
Process VariantPrinter-9:
Traceback (most recent call last):
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "build/bdist.macosx-10.8-intel/egg/genmod/utils/variant_printer.py", line 39, in run
next_result = self.task_queue.get()
File "<string>", line 2, in get
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/managers.py", line 755, in _callmethod
self._connect()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/managers.py", line 742, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/connection.py", line 169, in Client
c = SocketClient(address)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/connection.py", line 289, in SocketClient
s.connect(address)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 224, in meth
return getattr(self._sock,name)(*args)
error: [Errno 2] No such file or directory
^CError in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
Process VariantConsumer-8:
Traceback (most recent call last):
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
Process VariantConsumer-5:
Traceback (most recent call last):
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "build/bdist.macosx-10.8-intel/egg/genmod/utils/variant_consumer.py", line 148, in run
func(*targs, **kargs)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/util.py", line 295, in _exit_function
self.run()
File "build/bdist.macosx-10.8-intel/egg/genmod/utils/variant_consumer.py", line 148, in run
p.join()
next_batch = self.task_queue.get()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/process.py", line 145, in join
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/queues.py", line 115, in get
next_batch = self.task_queue.get()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/queues.py", line 115, in get
res = self._popen.wait(timeout)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/forking.py", line 148, in wait
self._rlock.acquire()
Hello Tim!
I will have a look at this right away.
Måns
I got the gtf file form here: http://www.ensembl.org/info/data/ftp/index.html
And my VCF looks like this:
##fileformat=VCFv4.1
##contig=<ID=1,length=249250621,assembly=b37>
##reference=file:///humgen/gsa-hpprojects/GATK/bundle/current/b37/human_g1k_v37.fasta
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT father mother proband
1 101 . A T 100 PASS MQ=1 GT:GQ 1/1:60 0/1:60 0/1:60
Now things should work again.
The first issue was because i will in next big update allow the user to choose if compounds should be valid only for exon variants. This is because of there are some genes with very large introns give rise to a huge number of compound pairs.
The problem with the vcf file that you showed was because it was not tab-separated, this should not be a problem any more.
I've also added a folder with a small test data set as you suggested, good idea!
Hope things will work out for you now,
Regards,
Måns
Hi Måns,
Just wanted to shoot in something with regards to your planned update with limiting compounds to exonic regions:
Tim
And thanks for the test data :+1:
Yes, it will include at least 2 bases on each side of the exons. Thank you for the thoughts! I will look closer at untranslated regions in exons.
Måns
Great! Could maybe be an idea to give users an option as to how far into the exons they wish to go, as some people like to consider just 2bp in the introns, whereas other go quite a few bases deeper into the intron as the splice site motif streaches much further although becomes a lot weaker.
I will defenitely include that option. I will also give support for vcf files that are annotated with vep soon, then the annoation will be determined by VEP and annotation files can be excluded when running GENMOD.
Supporting VEP is a good choice. In my opinion, this is the best functional annotator of VCF files.
I agree. After some years with Annovar I will switch to VEP. I think one of the most annoying problems with doing bioinformatics is formats, this is why i want to stay in vcf as long as possible and one of the reasons why i wrote this software.
yes, that is my strategy too:
All samples, all annotations, all sites in one VCF file. Then subset what you need and convert to another format if need be.
Hi,
I finally got round to having a go with your code.
../bin/arch/genmod-master-0.9.7/scripts/run_genmod.py -v -at gtf trio.fam recessive.vcf Homo_sapiens.GRCh37.71_chromosome1only.gtf
and got
Should I perhaps be working with an older version of the software?