Recommendations for working with VCF 4.2 #884

Closed stephenwilliams22 closed 6 years ago

stephenwilliams22 commented 6 years ago

Hi Aaron and Brent, On the the gemini main page you sate "GEMINI is very strict about adherence to VCF format 4.1." However, with the recent update to GATK4 the default, and unmodifiable, output of HaplotypeCaller is VCF 4.2. Do you all have recommendations for working with VCF 4.2? This is causing me a ton of trouble right now.

Find my error output below. I have 2TB of disc space available so I think that this much have to do with the second error "IOError: /dev/stdin if not valid bcf or vcf".

Any help would be greatly appreciated!

insert error trying 1 at a time:
sqlalchemy.OperationalError: (sqlite3.OperationalError) database or disk is full
Traceback (most recent call last):
  File "/mnt/home/stephen/Apps/gemini_tools/bin/gemini", line 7, in <module>
  File "/mnt/home/stephen/Apps/gemini_data/anaconda/lib/python2.7/site-packages/gemini/", line 1244, in main
    args.func(parser, args)
  File "/mnt/home/stephen/Apps/gemini_data/anaconda/lib/python2.7/site-packages/gemini/", line 311, in loadchunk_fn
    gemini_load_chunk.load(parser, args)
  File "/mnt/home/stephen/Apps/gemini_data/anaconda/lib/python2.7/site-packages/gemini/", line 910, in load
    gemini_loader = GeminiLoader(args)
  File "/mnt/home/stephen/Apps/gemini_data/anaconda/lib/python2.7/site-packages/gemini/", line 100, in __init__
    self.vcf_reader = self._get_vcf_reader()
  File "/mnt/home/stephen/Apps/gemini_data/anaconda/lib/python2.7/site-packages/gemini/", line 284, in _get_vcf_reader
    return vcf.VCFReader(self.args.vcf)
  File "cyvcf2/cyvcf2.pyx", line 183, in cyvcf2.cyvcf2.VCF.__init__ (cyvcf2/cyvcf2.c:7093)
IOError: /dev/stdin if not valid bcf or vcf
insert error trying 1 at a time:
arq5x commented 6 years ago

He @stephenwilliams22 - I not that the sqlachemy error is database or disk is full: are you sure this isn't your problem?

stephenwilliams22 commented 6 years ago

Thanks for the response Aaron. My disc definitely isn't full (2TB free) and I'm using --passonly to limit the number of variants. I have run this exact sample with using freebayes (VCF 4.1) and gemini worked fine. When I switched to GATK (VCF 4.2) everything blew up.

Here's my exact script to load the gemini db

gemini load -v my.VEP.vcf \
    -t VEP \
    --cores 20 \
    --skip-gene-tables \
    --passonly \
arq5x commented 6 years ago

I think your tempdir is full. Try setting --tempdir to the same path that my.VEP.vcf is writing too.

stephenwilliams22 commented 6 years ago

Thanks Aaron, This seems to have gotten me over the first hump. Now I seem to have a new error. I recently upgraded using conda to 0.20.1 and am getting this error when trying to load the db.

Traceback (most recent call last):
  File "/mnt/home/stephen/miniconda2/envs/gemini_env/bin/gemini", line 7, in <module>
  File "/mnt/home/stephen/miniconda2/envs/gemini_env/lib/python2.7/site-packages/gemini/", line 1248, in main
    args.func(parser, args)
  File "/mnt/home/stephen/miniconda2/envs/gemini_env/lib/python2.7/site-packages/gemini/", line 204, in load_fn
    gemini_load.load(parser, args)
  File "/mnt/home/stephen/miniconda2/envs/gemini_env/lib/python2.7/site-packages/gemini/", line 23, in load
    annos = annotations.get_anno_files(args)
  File "/mnt/home/stephen/miniconda2/envs/gemini_env/lib/python2.7/site-packages/gemini/", line 22, in get_anno_files
    anno_dirname = config["annotation_dir"]
KeyError: 'annotation_dir'
stephenwilliams22 commented 6 years ago

Looks like a fresh gemini install may have cured what ails me. That being said, do you have any suggestions on things to look out for with vcf 4.2?

arq5x commented 6 years ago

To be honest, I am not aware of any issues with 4.2. Have you seen any so far?

stephenwilliams22 commented 6 years ago

I'm doing some comparisons now but looks okay at first glance. I'm going to close this issue. Thanks Aaron!

noprobllama1010 commented 1 year ago

I'm doing some comparisons now but looks okay at first glance. I'm going to close this issue. Thanks Aaron!

Hello! Did you notice any differences, or loss of information?