Generating SQLite preprocessed files from gnomAD

KalinNonchev / gnomAD_DB

This package scales the huge gnomAD files to a SQLite database, which is easy and fast to query. It extracts from a gnomAD vcf the minor allele frequency for each variant.

MIT License

35 stars 10 forks source link

Generating SQLite preprocessed files from gnomAD #21

Closed brettChapman closed 11 months ago

brettChapman commented 11 months ago

I'm looking at using your tool for querying a specific set of genes from gnomAD, and to change to a new gnomeAD version as needed. I'd like to create my own filtered variant set from around 400 genes to reuse as a SQLite file in my pipeline. How did you go about generating your preprocessed SQLlite files you provided in the download link here: https://zenodo.org/record/6818606/files/gnomad_db_v3.1.2.sqlite3.gz?download=1

Ultimately I'd like a variant summary TSV from the output in a small pandas dataframe of 400 genes from gnomAD, to cross check with our internal database and also cross check against Clinvar.

Thanks.

KalinNonchev commented 11 months ago

Hello @brettChapman, Probably you have seen the workflow scripts here. You can use them to recreate the database.

In your case, it might be faster to use the already preprocessed database and select the variants which fall within the regions of your 400 genes.

Maybe you will find the following function to select all variants within a given interval useful.

db.get_info_for_interval(chrom=21, interval_start=9825780, interval_end=9825799, query="AF")

Once you have the selected variants you can merge them with external annotations.

Best,

KalinNonchev commented 11 months ago

Please don't hesitate to reopen this GitHub issue if you have any more questions or need further assistance.