broadinstitute / dsp-data-models

Data model definitions + Jade schemas for the DSP Core Model & Monster-specific extensions
BSD 3-Clause "New" or "Revised" License
1 stars 0 forks source link

Remove Jade schemas from this repo #30

Open danxmoran opened 5 years ago

danxmoran commented 5 years ago

This repo currently tracks both:

  1. Data model definitions for the DSPCore and extensions, in .ttl
  2. Jade schema definitions roughly mapping to the data model, in JSON

The more I think about it, the less I like co-locating these two artifacts, because:

  1. Our data model is "official", with heavy review from many stakeholders. The schemas don't have nearly as much scrutiny (and I think that's mostly a good thing).
  2. There is no one-size-fits-all schema for a Donor, Biosample, etc. It's unlikely that our ingest pipelines will ever actually use the "core" table definitions, so it's strange (to me) to present them as official recommendations.
  3. The rate-of-change for the data model is, and should continue to be, decoupled from the rates-of-change of each Jade schema. Co-locating the two gives the impression that they should be updated together, which is really not what we want for long-running ingests.

Moving forward, I'd like to make this repository all about the data model, and move Jade schemas into project-specific repositories (i.e. a broadinstitute/clinvar-ingest repo). @kreinold @larrybabb (and anyone else listening), what are your thoughts?

larrybabb commented 5 years ago

I think it makes sense to move the project-specific schemas into there own repos. I think once the core model is more established and we can show "true" extensions of it for specific repos that may be reusable by several projects, then you may want to start capturing those kind of extensions in the main repo.

The SEPIO ontology uses this kind of approach to provide a place for folks to utilize each other's extensions to the common/core model in SEPIO.

See here for the main SEPIO ontology and you'll see the extensions subfolder that ClinGen uses to share it's extensions for two separate project our "acmg" variant pathogenicity extension and our "dosage" model for capturing haploinsufficiency and triplosensisitiviy statements on CNV regions.