arq5x / gemini

a lightweight db framework for exploring genetic variation.
http://gemini.readthedocs.org
MIT License
318 stars 120 forks source link

Gemini not found error while loading chunks from VCF #957

Open frankMusacchia opened 2 years ago

frankMusacchia commented 2 years ago

Hello, I am trying to use gemini. After installing all needed dependencies I had a successful installation. I am using a large VCF WGS file (~12GB) with ~1000 of samples The first time I used the "load" function, gemini complained that yaml.load was deprecated and I got an error:

/home/francesco/bin/gemini_data/anaconda/lib/python2.7/site-packages/gemini/config.py:61: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. config = yaml.load(in_handle)

CADD scores are not being loaded because the annotation file could not be found. Run gemini update --dataonly --extra cadd_score to install the annotation file.

GERP per bp is not being loaded because the annotation file could not be found. Run gemini update --dataonly --extra gerp_bp to install the annotation file.

Loading 1714760 variants. Breaking /media/francesco/I/gemini/ppmi.july2018.chr14.vqsr.norm.vcf.gz into 12 chunks. Loading chunk 0. Loading chunk 1. /bin/sh: 1: gemini: not found Loading chunk 2. /bin/sh: 1: gemini: not found Loading chunk 3. /bin/sh: 1: gemini: not found Loading chunk 4. /bin/sh: 1: gemini: not found Loading chunk 5. /bin/sh: 1: gemini: not found Loading chunk 6. /bin/sh: 1: gemini: not found Loading chunk 7. /bin/sh: 1: gemini: not found Loading chunk 8. /bin/sh: 1: gemini: not found Loading chunk 9. /bin/sh: 1: gemini: not found Loading chunk 10. /bin/sh: 1: gemini: not found Loading chunk 11. /bin/sh: 1: gemini: not found /bin/sh: 1: gemini: not found Traceback (most recent call last): File "/home/francesco/bin/gemini_tools/bin/gemini", line 7, in gemini_main.main() File "/home/francesco/bin/gemini_data/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 1249, in main args.func(parser, args) File "/home/francesco/bin/gemini_data/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 204, in load_fn gemini_load.load(parser, args) File "/home/francesco/bin/gemini_data/anaconda/lib/python2.7/site-packages/gemini/gemini_load.py", line 49, in load load_multicore(args) File "/home/francesco/bin/gemini_data/anaconda/lib/python2.7/site-packages/gemini/gemini_load.py", line 93, in load_multicore chunks = load_chunks_multicore(grabix_file, args) File "/home/francesco/bin/gemini_data/anaconda/lib/python2.7/site-packages/gemini/gemini_load.py", line 264, in load_chunks_multicore wait_until_finished(procs) File "/home/francesco/bin/gemini_data/anaconda/lib/python2.7/site-packages/gemini/gemini_load.py", line 359, in wait_until_finished raise ValueError("Processing failed on GEMINI chunk load") ValueError: Processing failed on GEMINI chunk load

I went to: https://github.com/yaml/pyyaml/wiki/PyYAML-yaml.load(input)-Deprecation and I chose to modify the code to safe_yaml

But I still get the same error. Can you please tell me if there is a known solution to this? Thanks Francesco

brentp commented 2 years ago

Hi, how are you installing gemini? What is the path of the code you modified?

You could try to install an older version of pyyaml.

frankMusacchia commented 2 years ago

Hi, I have executed the instructions at: https://gemini.readthedocs.io/en/latest/

and installed for example grabix, vt and snpeff that were required

brentp commented 2 years ago

well, the most critical error is that it's not finding gemini when you run the load. what does: env | grep -i python show?

Also, let's get your gemini install working, but you might consider using slivar (https://github.com/brentp/slivar), especially for a cohort this size.

frankMusacchia commented 2 years ago

Yes, the cohort is large and I was thinking if gemini would work on that. I will try slivar then.. thank you