Closed CarlosGAH closed 5 years ago
I forgot to mention, that error do not interrupt the loading, but at the end the proccess fails ValueError: Processing failed on GEMINI chunk load
thanks for reporting. can you report the output of ls clin*
in your gemini data directory that contains all the vcfs and bed annotation files?
Here it is /usr/local/share/gemini/gemini_data$ ls clin* clinvar_20170130.tidy.vcf.gz clinvar_20170130.tidy.vcf.gz.tbi clinvar_20190102.tidy.vcf.gz clinvar_20190102.tidy.vcf.gz.tbi
would you try manually removing clinvar_20170130.tidy.vcf.gz
and clinvar_20170130.tidy.vcf.gz.tbi
to make sure gemini is not getting an older version?
ok, i will try and i will inform again
The same problem appears I have removed the archives as you said. But the problem persists This problem do not arise in smaller vcfs (for example trios from exome sequencing). But in this big vcfs (trios from whole genome), the problem appears. If i select a fraction of this big vcf (1/10 for example), the problem do not appear
Dear Brent,
i have the same problem - many CLNSIG errors during vcf loading:
Traceback (most recent call last):
File "/home/viktor/gemini/bin/gemini", line 7, in
and finally it crash at the end:
Traceback (most recent call last):
File "/home/viktor/gemini/bin/gemini", line 7, in
Any ideas ? Thanks
I have tried the same big vcf that gave me probems with gemini 20.1 (clean installation from gemini_install.py in a new computer, updated dataonly and cadd y gerp scores), and it gave no problems at all. The problem is that the databases are a bit old. There is a way to update only the databases (for example clinvar and dbSNP?)
can you get me the portion of the vcf that will recreate the error? I know you said on a small subset you do not see it, but you should be able to find 1 chunk of the file that gives the error. then I can debug and fix this problem for anyone who might encounter it.
There is link for test vcf download. Hope it helps.
http://www.uschovna.cz/en/zasilka/JT42PY9VDAMSXGTN-SGU/?set_lang=en Best regards Viktor
Here is the portion of the vcf that gave the problem/error gemini_problem.vcf.gz
thank you very much for the test-case. I have a push for this that will be pushed shortly.
this is now fixed in master and will be a part of the release.
Hi everybody. First of all, gemini is a great tool. I am having a problem loading a full genome vcf from a trio (father, mother and son). I am using gemini devel version 0.30 the comand is this gemini load -v filtered_normalized_annotated.vcf -p P_Trio.ped -t snpEff --cores 3 Trio_1.db
Everything goes smoothly until this error arises pid 8503: 239999 variants processed. pid 8506: 239999 variants processed. pid 8509: 249999 variants processed. Traceback (most recent call last): File "/usr/local/bin/gemini", line 7, in
gemini_main.main()
File "/usr/local/share/gemini/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 1249, in main
args.func(parser, args)
File "/usr/local/share/gemini/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 311, in loadchunk_fn
gemini_load_chunk.load(parser, args)
File "/usr/local/share/gemini/anaconda/lib/python2.7/site-packages/gemini/gemini_load_chunk.py", line 918, in load
gemini_loader.populate_from_vcf()
File "/usr/local/share/gemini/anaconda/lib/python2.7/site-packages/gemini/gemini_load_chunk.py", line 223, in populate_from_vcf
(variant, variant_impacts) = self._prepare_variation(var, anno_keys)
File "/usr/local/share/gemini/anaconda/lib/python2.7/site-packages/gemini/gemini_load_chunk.py", line 407, in _prepare_variation
clinvar_info = annotations.get_clinvar_info(var)
File "/usr/local/share/gemini/anaconda/lib/python2.7/site-packages/gemini/annotations.py", line 648, in get_clinvar_info
clinvar.clinvar_sig = info_map['CLNSIG'].lower()
KeyError: 'CLNSIG'
pid 8506: 249999 variants processed.
pid 8509: 259999 variants processed.
pid 8506: 259999 variants processed.
It looks like that the chunk that is processed in pid 8503, fails completely and is not resumed in the process. I have the lastest versions of the annotations databases from gemini (update --dataonly). I have used other vcf from other trios (exome instead of genome), and this problem did not appear. Could it be a problem within the vcf (a problematic snp?) or a bug in the devel version of gemini? I am completely lost. As i said before, i have used the same thing with other trios vcfs (from exome) without any problem.