KalinNonchev / gnomAD_DB

This package scales the huge gnomAD files to a SQLite database, which is easy and fast to query. It extracts from a gnomAD vcf the minor allele frequency for each variant.
MIT License
35 stars 10 forks source link

Missing variant in Gnomad4 but in Gnomad3 #27

Closed lbundalian closed 7 months ago

lbundalian commented 8 months ago

I have downloaded the Gnomad3 and 4 version from here. However, I have seen some variants which can be found in Gnomad3 but not in 4 which should be the other way around, as 4 is also 3 + the new variants found with additional sequencing

KalinNonchev commented 8 months ago

Hello @lbundalian , Thank you for your comment. In 4. there are additional filters, etc. Always refer to the official gnomad search engine website.

Please let me know if you can see these variants in 4.0 on the website but not in the shared 4.0 SQL file. Please provide examples.

Best,

lbundalian commented 8 months ago

7-92134353-C-T image

versus

gnomad.sqlite3 (gnomad 4) image

lbundalian commented 8 months ago

some more examples: image

KalinNonchev commented 8 months ago

Ok thank you. I will double check this in the next days and let you know.

KalinNonchev commented 7 months ago

Hello @lbundalian, it appears that the bam files were corrupted during download due to their size. I have addressed this issue in the latest commit (https://github.com/KalinNonchev/gnomAD_DB/commit/c191de411607a7c43078d980d097442db999cefe). I have recomputed the SQLite databases to ensure their integrity and you can find the new links in the README.

sqlite> select * from gnomad_db where chrom = '7' and pos = 92134353;
7|92134353|C|A|PASS|1.0|152190.0|6.57073e-06|59.9396|13.3918|0.602|268.0|4.7107|1.0|68030.0|1.46994e-05|0.0|1.46994e-05|0.0|0.0|0.0
7|92134353|C|T|PASS|1.0|152190.0|6.57073e-06|59.9396|13.3918|0.602|268.0|1.707|1.0|68030.0|1.46994e-05|0.0|1.46994e-05|0.0|0.0|0.0

Thank you for the feedback!