genome-nexus / genome-nexus-importer

Import data into MongoDB for use by https://github.com/genome-nexus/genome-nexus/
MIT License
4 stars 16 forks source link

Add clinvar #39

Closed leexgh closed 3 years ago

leexgh commented 3 years ago

Fix: https://github.com/knowledgesystems/signal/issues/104

Conver ClinVar VCF to tsv file and save to database. transform_vcf_to_tsv.py is a generic script to transform vcf to tsv, could be used on other files.

To download and generate clinvar files, run make clinvar/input/clinvar_grch37_input.vcf.gz to download vcf files first, then run make clinvar/export/clinvar_grch38.tst.gz.

When generating clinvar files, there might be some warnings: [W::vcf_parse] Contig '1' is not defined in the header. (Quick workaround: index the file with tabix.), I guess it's ok to ignore them for now because everything works as expected. Solutions to fix(haven't verified yet): https://www.biostars.org/p/407384/

Reference: https://github.com/sigven/vcf2tsv https://github.com/brentp/cyvcf2