arq5x / gemini

a lightweight db framework for exploring genetic variation.
http://gemini.readthedocs.org
MIT License
317 stars 119 forks source link

Using gemini API in python3 #937

Closed matthdsm closed 4 years ago

matthdsm commented 4 years ago

Hi,

I'm trying to integrate gemini into a webapp, using Django + the gemini python API. Most of the functions work really well, but there are some issues with decoding the snappy compressed columns when using gt_filters.

The snappy traceback is as follows:

  >>> variants = [row for row in gq]
  File "/venv/lib/python3.7/site-packages/gemini/GeminiQuery.py", line 741, in next
    unpacked[col] = row[col]
  File "/venv/lib/python3.7/site-packages/gemini/GeminiQuery.py", line 449, in __getitem__
    self.cache[key] = self.unpack(self.row[key])
  File "/venv/lib/python3.7/site-packages/gemini/compression.py", line 94, in snappy_unpack_blob
    dt = lookup[blob[0]]

Any ideas how I can fix this? The db was created using vcf2db. Querying using the gemini CLI seems to work fine (py2).

Thanks for the help Cheers M

arq5x commented 4 years ago

@brentp do you have a sense of how awful this would be?

brentp commented 4 years ago

most stuff should work on python3. is that the full traceback?

matthdsm commented 4 years ago

yep, everything else is just some django related things I've omitted for clarity.

brentp commented 4 years ago

can you post the full trackback? the actual error is not in what you've shown.

brentp commented 4 years ago

actually, the doctests fail in python3 so I can fix those.

matthdsm commented 4 years ago

Hi Brent,

thanks for looking into this. The full Django traceback:

Environment:

Request Method: POST
Request URL: http://localhost:8000/query/?file=/Users/matdsmet/Downloads/seqplorer-data/HSQ_136/GiB_NA12891E2-joint-gatk-haplotype-joint.db

Django Version: 2.2.5
Python Version: 3.7.3
Installed Applications:
['seqplorer_ui.apps.SeqplorerUiConfig',
 'django.contrib.admin',
 'django.contrib.auth',
 'django.contrib.contenttypes',
 'django.contrib.sessions',
 'django.contrib.messages',
 'django.contrib.staticfiles']
Installed Middleware:
['django.middleware.security.SecurityMiddleware',
 'django.contrib.sessions.middleware.SessionMiddleware',
 'django.middleware.common.CommonMiddleware',
 'django.middleware.csrf.CsrfViewMiddleware',
 'django.contrib.auth.middleware.AuthenticationMiddleware',
 'django.contrib.messages.middleware.MessageMiddleware',
 'django.middleware.clickjacking.XFrameOptionsMiddleware']

Traceback:

File "/Users/matdsmet/miniconda3/envs/django/lib/python3.7/site-packages/django/core/handlers/exception.py" in inner
  34.             response = get_response(request)

File "/Users/matdsmet/miniconda3/envs/django/lib/python3.7/site-packages/django/core/handlers/base.py" in _get_response
  115.                 response = self.process_exception_by_middleware(e, request)

File "/Users/matdsmet/miniconda3/envs/django/lib/python3.7/site-packages/django/core/handlers/base.py" in _get_response
  113.                 response = wrapped_callback(request, *callback_args, **callback_kwargs)

File "/Users/matdsmet/miniconda3/envs/django/lib/python3.7/site-packages/django/contrib/auth/decorators.py" in _wrapped_view
  21.                 return view_func(request, *args, **kwargs)

File "/Users/matdsmet/OneDrive - UGent/Projects/seqplorer_django/seqplorer/seqplorer_ui/views.py" in query
  141.         for row in gq:

File "/Users/matdsmet/miniconda3/envs/django/lib/python3.7/site-packages/gemini/GeminiQuery.py" in next
  741.                             unpacked[col] = row[col]

File "/Users/matdsmet/miniconda3/envs/django/lib/python3.7/site-packages/gemini/GeminiQuery.py" in __getitem__
  449.                 self.cache[key] = self.unpack(self.row[key])

File "/Users/matdsmet/miniconda3/envs/django/lib/python3.7/site-packages/gemini/compression.py" in snappy_unpack_blob
  94.     dt = lookup[blob[0]]

Exception Type: KeyError at /query/
Exception Value: 105

The query and gt filter used were: select * from variants where (impact_severity in ("MED","HIGH") and max_af <= 0.02 and qual >= 20 and dp >= 2) and ((gt_types.GiB_NA12891E2 == HET) or (gt_types.GiB_NA12891E2 == HOM_ALT))

and the function call:

from gemini import GeminiQuery
gq = GeminiQuery.GeminiQuery(db, include_gt_cols=True)
gq.run(query, gt_query)

Matthias

brentp commented 4 years ago

I pushed a fix if you want to give it a try.

matthdsm commented 4 years ago

Great! That fixed it.

Thanks a bunch, I was really stuck here.

Cheers Matthias