Ask about loading VCF data into Solr (how do we load variant data into Monarch?)

diatomsRcool commented 4 years ago

From Kent: we're not loading from VCF files although we talked about doing this ages ago, clinvar publishes vcf, gnomad does, sure many others do right now variants are pretty basic, theres an identifier and sometimes a type (probably structural vs functional) we link to the gene(s) that the source indicates are related, but there are some nuances with clinvar and gwas catalog around this for example, we use a different relationship for upstream/downstream for gwas catalog, intron/exon are treated the same though

there's some background to modelling locations and coordinates as RDF and OWL (faldo, monochrom) I think it's a little over-modelled for many of our use cases https://github.com/OBF/FALDO https://github.com/monarch-initiative/monochrom we do a have a feature location index in solr, https://solr-dev.monarchinitiative.org/solr/feature-location/select/?q=*:*&wt=json , we even had an undergrad student build a front end widget around this as a summer project, but all this work fell off

dlebauer commented 4 years ago

@diatomsRcool or @kshefchek is this issue still relevant?

kshefchek commented 4 years ago

I think we can close, solr is not really the right tool to store vcf data

diatomsRcool commented 4 years ago

I agree with Kent.

genophenoenvo / terraref-datasets

Ask about loading VCF data into Solr (how do we load variant data into Monarch?) #43