arq5x / gemini

a lightweight db framework for exploring genetic variation.
http://gemini.readthedocs.org
MIT License
318 stars 120 forks source link

Hyphens in extra VEP columns #887

Closed oleraj closed 6 years ago

oleraj commented 6 years ago

Hi,

In the process of adding more annotations from dbNSFP in VEP, I also noticed that some of the fields have hyphens which is causing problems. There are some of the fields in VEP:

M-CAP_score,M-CAP_pred,Eigen-phred,Eigen-PC-phred

These are in fact brought into the database by gemini. When I do gemini db_info I see them:

variants            vep_provean_score             TEXT      
variants            vep_provean_pred              TEXT      
variants            vep_m-cap_score               TEXT      
variants            vep_m-cap_pred                TEXT      
variants            vep_revel_score               TEXT      
variants            vep_revel_rankscore           TEXT      
variants            vep_eigen-phred               TEXT      
variants            vep_eigen-pc-phred            TEXT      
...

However, whey I try to do a query, it doesn't interpret the column correctly:

gemini query --header -q 'select gene, chrom, start, ref, alt, vep_eigen-phred from variants where impact_severity in ('HIGH', 'MED') and aaf_gnomad_all < 0.001' CCGO_801065.22.db  | head
SQL error: (sqlite3.OperationalError) no such column: vep_eigen [SQL: u'select gene,  chrom,  start,  ref,  alt,  vep_eigen-phred from variants where impact_severity in (HIGH,  MED) and aaf_gnomad_all < 0.001']
Traceback (most recent call last):
  File "/sysapps/cluster/software/Anaconda/2.3.0Linux-x86_64/envs/geminienv2/bin/gemini", line 7, in <module>
SQL error: (sqlite3.OperationalError) no such column: vep_eigen [SQL: u'select gene,  chrom,  start,  ref,  alt,  vep_eigen-phred from variants where impact_severity in (HIGH,  MED) and aaf_gnomad_all < 0.001']

    gemini_main.main()
  File "/sysapps/cluster/software/Anaconda/2.3.0Linux-x86_64/envs/geminienv2/lib/python2.7/site-packages/gemini/gemini_main.py", line 1244, in main
    args.func(parser, args)
  File "/sysapps/cluster/software/Anaconda/2.3.0Linux-x86_64/envs/geminienv2/lib/python2.7/site-packages/gemini/gemini_main.py", line 439, in query_fn
    gemini_query.query(parser, args)
  File "/sysapps/cluster/software/Anaconda/2.3.0Linux-x86_64/envs/geminienv2/lib/python2.7/site-packages/gemini/gemini_query.py", line 169, in query
    run_query(args)
  File "/sysapps/cluster/software/Anaconda/2.3.0Linux-x86_64/envs/geminienv2/lib/python2.7/site-packages/gemini/gemini_query.py", line 135, in run_query
    gene_needed, args.show_families, subjects=subjects)
  File "/sysapps/cluster/software/Anaconda/2.3.0Linux-x86_64/envs/geminienv2/lib/python2.7/site-packages/gemini/GeminiQuery.py", line 653, in run
    self.result_proxy = res = iter(self._apply_query())
  File "/sysapps/cluster/software/Anaconda/2.3.0Linux-x86_64/envs/geminienv2/lib/python2.7/site-packages/gemini/GeminiQuery.py", line 924, in _apply_query
    res = self._execute_query()
  File "/sysapps/cluster/software/Anaconda/2.3.0Linux-x86_64/envs/geminienv2/lib/python2.7/site-packages/gemini/GeminiQuery.py", line 883, in _execute_query
    raise ValueError("The query issued (%s) has a syntax error." % self.query)
ValueError: The query issued (select gene,  chrom,  start,  ref,  alt,  vep_eigen-phred from variants where impact_severity in (HIGH,  MED) and aaf_gnomad_all < 0.001) has a syntax error.

Instead of finding vep_eigen-phred, it says it can't find vep_eigen. Is there any way for me to construct the query to retrieve the values in column vep_eigen-phred? If not, it would be a good idea to change hyphens to underscores when parsing the extra VEP fields so it doesn't cause this issue.

Thanks!

Andrew

brentp commented 6 years ago

Did you load this with gemini or vcf2db?

oleraj commented 6 years ago

Gemini

brentp commented 6 years ago

I just pushed a fix for this into master. Please let me know if you have any additional problems. thanks for reporting.