arq5x / gemini

a lightweight db framework for exploring genetic variation.
http://gemini.readthedocs.org
MIT License
319 stars 120 forks source link

Inheritance models not working? #779

Closed komalsrathi closed 8 years ago

komalsrathi commented 8 years ago

Hi,

I recently updated gemini to v0.19.1. I have noticed that when I created a gemini db using the an older version of gemini and vcfanno v0.0.11, I could directly apply inheritance models to them using:

vcfanno \
-p 4 \
-base-path $GEMINI_ANNO \
-lua rare-disease.lua \
rare-disease.conf $VCF \
| bgzip -c > anno.vcf.gz

gemini_python vcf2db.py --legacy-compression anno.vcf.gz $PED test.db

# I am able to query the database directly:
$ gemini query --header -q "select * from variants" test.db | head

variant_id  chrom   start   end vcf_id  ref alt qual    filter  type    sub_type    call_rate   num_hom_ref num_het num_hom_alt aaf hwe inbreeding_coef pi  gene    transcript  is_exonic   is_coding   is_lof  is_splicing exon    codon_change    aa_change   aa_length   biotype impact  impact_so   impact_severity polyphen_pred   polyphen_score  sift_pred   sift_score  an  baseqranksum    clippingranksum db  dp  dsfs    haplotypescore  inbreedingcoeff lof mq  mqranksum   nmd old_multiallelic    old_variant provean_prediction  provean_score   qd  readposranksum  sift_prediction sor aminoacidchange exac_af_all exac_an_alfunctiongvs   literature_genes    polyphen    recurrent_genes rvis_genes  rvis_score  set
1   chr1    17384   17385   None    G   A   183.399993896   None    snp ts  1.0 0   2   1   0.666666666667  None    None    None    MIR6859-1   ENST00000619216 1   0   0   0   1           None    miRNA   exon_variant    exon_variant    LOW None    None    None    None    6   1.45000004768   -1.03600001335  None    23  None    0.0 None    None    None    48.2799987793   -1.03600001335  None    None    None    None    None    14.1099996567   0.633000016212  None    0.836000025272  None    0.245377212762833   5408.0  None    None    None    None    None    None    None
2   chr1    63734   63738   None    CCTA    C   251.5   None    indel   del 1.0 2   1   0   0.166666666667  None    None    None    OR4G11P ENST00000492842 1   0   0   0   1           None    unprocessed_pseudogene  exon_variant    exon_variant    LOW None    None    None    None    6   -0.896000027657 0.677999973297  None    29  None    0.0 None    None    None    29.9899997711   -1.2460000515   None    None    None    None    None    22.8600006104   -1.0340000391   None    1.26999998093   None    None    None    None    None    None    None    None    None    None

# this does not work anymore
gemini de_novo test.db > out.txt 

I have tested the same sample before and gotten results using de_novo. However, I tried the same thing today with a database generated using gemini v0.19.1 and I don't get any output. Is there an intermediate step involved now?

This is the link to the annotated vcf (by vcfanno), gemini database and the pedfile: https://drive.google.com/open?id=0B-8gQV1WZcYdU1VZS3Nvb2E4WXM

brentp commented 8 years ago

So you used gemini load and the inheritance models aren't giving results? Or you used vcf2db.py to load?

komalsrathi commented 8 years ago

I first created an annotated vcf using vcfanno and then using vcf2db.py I created the gemini compatible database (code in question). I used to do the same thing before and it used to give me results for the inheritance models.

brentp commented 8 years ago

can you do:

gemini query -q "select chrom, start, end, (gt_types).(*) from variants limit 10" your.db

and show the result? You may have loaded with a broken version of vcf2db.py that was recently fixed.

komalsrathi commented 8 years ago
$ gemini query -q "select chrom, start, end, (gt_types).(*) from variants limit 10" test.db 
chr1    17384   17385   3   0   4
chr1    63734   63738   3   0   1
chr1    69510   69511   3   0   4
chr1    69760   69761   3   0   4
chr1    137824  137825  1094810689  791560572   6394158
chr1    182685  182686  3   0   4
chr1    183661  183662  3   0   4
chr1    183799  183800  1094810689  791560572   6394158
chr1    186290  186291  3   0   4
chr1    187101  187102  3   0   4
brentp commented 8 years ago

yeah, update vcf2db.py and reload. You should only see number 0 through 4 in there.

komalsrathi commented 8 years ago

So I had a working installation before (there was no specification for cyvcf2 and peddy at that time) and now I cannot install vcf2db:

$ conda --version
conda 4.1.11

$ python --version
Python 3.5.2 :: Continuum Analytics, Inc.

$ conda install -c bioconda cyvcf2 peddy
Fetching package metadata .............
Solving package specifications: ....

The following specifications were found to be in conflict:
  - cyvcf2
  - python 3.5*
Use "conda info <package>" to see the dependencies for each package.

The two packages cyvcf2 (python 2.7) and peddy (python 3.5) are using different python environments. How do I resolve this?

brentp commented 8 years ago

you'll need to switch to python2.

komalsrathi commented 8 years ago

Using gemini_conda instead of conda switched the env to python2. The installation worked and so did the query for de_novo. Thanks a lot!

brentp commented 8 years ago

great! thanks for following up.