KalinNonchev / gnomAD_DB

This package scales the huge gnomAD files to a SQLite database, which is easy and fast to query. It extracts from a gnomAD vcf the minor allele frequency for each variant.
MIT License
35 stars 10 forks source link

Missing variants in v4 #29

Closed zainomarali closed 4 months ago

zainomarali commented 5 months ago

Hi,

I looked up 371 variants in your v4 gnomad sqlite database (downloaded using this link: "https://zenodo.org/records/10066323/files/gnomad_db_wgs_v4.0.sqlite3.gz?download=1", and then followed your instructions to download and unzip in python).

I converted my variants to strings of the form "8:56113670:C>A" and then used "get_info_from_str" to extract the info. 96 variants fail to return values (I am attaching the variants here).

missingvars_sqlite.txt

Not sure what the issue is, when I search by region I am also unable to extract the info, even though all the information for these variants is available on the gnomAD website. I would really like to avoid having to download the VCF files if possible and your tool seemed to be the best way.

Thank you for your help.

KalinNonchev commented 5 months ago
from gnomad_db.database import gnomAD_DB
db = gnomAD_DB(db_loc, gnomad_version="v4")
db.get_info_from_str("12:47818936:C>T", "*")
chrom                    12
pos                47818936
ref                       C
alt                       T
filter                 PASS
AC                  19849.0
AN                 152188.0
AF                 0.130424
MQ                  59.9972
QD                  15.3561
ReadPosRankSum        0.381
VarDP             1311211.0
AS_VQSLOD           22.0502
AC_grpmax           13083.0
AN_grpmax           67972.0
AF_grpmax          0.192476
AF_eas             0.005803
AF_nfe             0.192476
AF_fin             0.174008
AF_afr             0.039115
AF_asj              0.15562
Name: 0, dtype: object
KalinNonchev commented 5 months ago

Hi, it looks like your variant is available in the WGS version v4. Please make sure that the WGS v4 sqlite database name is gnomad_db.sqlite3