KalinNonchev / gnomAD_DB

This package scales the huge gnomAD files to a SQLite database, which is easy and fast to query. It extracts from a gnomAD vcf the minor allele frequency for each variant.
MIT License
35 stars 10 forks source link

Question about Exomes and Genomes Filters #32

Closed Sophieeeeeeee closed 3 months ago

Sophieeeeeeee commented 3 months ago

Hello,

Sorry if it is a bad question, I wonder why only exome data is included when I am using function "get_info_from_df"? For example, for "1:55039847:G>A", gnomAD browser shows it has 286 AC for exome and 10 AC for genome (link attached here). While the code db.get_info_from_str("1:55039847:G>A", "*") yields output

chrom 1 pos 55039847 ref G alt A filter PASS AC 286.0 AN 1413248.0 AF 0.000202 MQ 60.0 QD 12.991 ReadPosRankSum -0.025 VarDP 41127.0 AS_VQSLOD 9.0984 AC_grpmax 281.0 AN_grpmax 37626.0 AF_grpmax 0.007468 AF_eas 0.007468 AF_nfe 0.000001 AF_fin 0.0 AF_afr 0.0 AF_asj 0.0 Name: 0, dtype: object

As shown above, output has 286 AC for exome, but no 10 AC for genome. For your information, I am using v4.1 data, with download link presented as follows: WES gnomAD v4.1 (hg38, 183'558'769 variants) 8.3G zipped, 19G in total - https://zenodo.org/records/11076395/files/gnomad_db.sqlite3.gz?download=1

I wonder if there are any method to get both exome and genome data? Any feedback is greatly appreciated. Thanks!

KalinNonchev commented 3 months ago

Hi @Sophieeeeeeee ,

thank you for your question. You can download WGS v4.1 in a separate folder and have both WES and WGS by initializing two separate instances.

Please let me know if you have further questions.

Best,