clingen-data-model / allele

Documentation for data model of ClinGen
10 stars 2 forks source link

Next Steps for the DMWG #46

Closed cbizon closed 9 years ago

cbizon commented 9 years ago

With the release of the 0.1 version of the allele data model immanent, we need to open a discussion on the next effort for the work group. We know that there will already be effort put towards supporting the ClinGenDB's use of the model, which will be the first priority, but what else should we be doing?

1) Polishing / documentation of allele 0.1 2) Creation of a reference implementation for allele 0.1 3) allele 0.2: incorporation of structural variants, other stuff.. 4) bottom up: model population data (allele frequency information) as a simple model that would make use of the allele model 5) bottom up: individual data model (genotypes, family information, phenotyping) 6) bottom up: model assertions - this would probably involve e.g. also doing 4. 7) top down: broad modeling effort, trying to get everything in at a lower detail level.

Others to consider?

cbizon commented 9 years ago

My vote (FWIW) is model assertions.

larrybabb commented 9 years ago

There seems to be a lot of pressure building on the genome connect and case level repository front. We should also recognize that the EHR WG is starting to ramp up and they will most definitely be needing a way to represent case data. The phenotype WG has been attempting to work with ClinVar to find a way to get case data phenotypic info into the submission form to support the metabolic disorders clinical domain WG (not an easy task to do in isolation). So, I really feel that we need to take a stab at Case-Patient-Indication-Phenotype-Results (Findings and Interpretation) and show how it links into the Allele model. It will also provide value to the IoM Action Collaborative that is about to embark on an effort to define a structured representation of genotypes for pharmacogenetic test results and how they map to star alleles to support a pilot project where labs will send this structured data to provider EHR systems so that they can be used for clinical decision support. Seems like this could be a highly visible and very valuable place to influence and expose our allele model work (definitely should help us validate that we produced something that is worthy of consideration in the final solution).

So I would like to add 8) middle in: lab test results for genetic sequencing tests that contain structured indication, phenotypic observations provided by requester or lab, indication findings (which include indication related genotypes with allele specific assertion for the related indication), and (potentially) incidental findings (which includes genotypes and their allele specific assertions for unrelated indications).

High risk, high reward! I vote for this FWIW. There's urgency here, so it may make sense to do the assertion stuff first. It would be nice to at least do a basic draft of this model and then dive into assertions in detail. This way we could show a roadmap to external groups and such.

From: cbizon notifications@github.com Reply-To: tnavatar/clingen-data-model <reply+000ea21bfc7a64dde1b62fe6c177d606fb95871c3dd7662b92cf0000000111107bfa9 2a169ce0393ac0d@reply.github.com> Date: Thursday, March 5, 2015 3:18 PM To: tnavatar/clingen-data-model clingen-data-model@noreply.github.com Subject: [clingen-data-model] Next Steps for the DMWG (#46)

With the release of the 0.1 version of the allele data model immanent, we need to open a discussion on the next effort for the work group. We know that there will already be effort put towards supporting the ClinGenDB's use of the model, which will be the first priority, but what else should we be doing?

1) Polishing / documentation of allele 0.1 2) Creation of a reference implementation for allele 0.1 3) allele 0.2: incorporation of structural variants, other stuff.. 4) bottom up: model population data (allele frequency information) as a simple model that would make use of the allele model 5) bottom up: individual data model (genotypes, family information, phenotyping) 6) bottom up: model assertions - this would probably involve e.g. also doing 4. 7) top down: broad modeling effort, trying to get everything in at a lower detail level.

Others to consider?

‹ Reply to this email directly or view it on GitHub https://github.com/tnavatar/clingen-data-model/issues/46 .

tnavatar commented 9 years ago

Whatever we do next, my sense is that it should connect directly to one of the current IT demands of ClinGen. The big ones I'm aware of right now (apart from the allele registry) are:

  1. The gene curation app & data flow
  2. The (just now being discussed) variant curation app & data flow
  3. The case level variation database
  4. The actionability curation app & data flow
  5. The (currently active) structural variant curation app & data flow

Getting involved with any of these has issues of one kind or another:

  1. I've been keeping an eye on the gene curation team. They're still in the process of defining their requirements, while Kang, Tam and Selina design and build an app around them. Honestly, this process seems to be working very well. Having a developer working on the team means they're designing their model and building their app, with a team on hand to see that it meets their needs. I'm not sure how this group could do useful modeling work on this one right now, especially as some important questions haven't been answered yet, for instance how input from external collaborators will be incorporated and affect the overall process. I think we engage when the curation team has come father along in defining requirements and a complete prototype of the curation app is built. I don't know when to expect this, but they're making rapid progress.
  2. The variant curation activity is just getting going (I've been keeping an eye on this one too). They are also in the process of defining their requirements for a model. There seem also to be some big questions yet to be answered here and I don't know how much progress we could make modeling the variant assertion until this group has come farther along.
  3. There's definitely a lot of pressure to build this, as Larry mentions. This one is tough because who is going to build what is still being debated (Carlos and Heidi seem to be working this out). That said, it's a little easier to imagine what we have to do to support this--solidifying the allele model (CNV support is a requirement for this, I think), and starting to make some progress on how phenotypes are represented in ClinGen. That said, I think phenotype WG needs to make some progress on that front...
  4. These guys actually seem to be the farthest along in terms of having a set of requirements that can be translated into a data model. That said, doing modeling work on this isn't helpful until someone can build an app around the work they're doing. I think they're still expecting the ClinGenDBDataModelInformatics team to make this, and nobody's got developer time to spare right now.
  5. These guys are even farther along than (4), but they're humming along happily with their JIRA based workflow, and haven't been making IT demands inside of ClinGen.

The conclusion I reach in thinking about this is that we need we need more strategic thinking about what working groups have what dependencies on each other, and what goals each needs to clear to meet certain milestones. I'm gonna guess this won't happen overnight. Given that, it might make sense for us to divide and conquer again, with each of us engaging in one (or more) of the working groups involved in the above activities and getting to understand what their modeling requirements really are, so we can ramp up quickly when there's a project that's ready to be built (though my sense is that the case level database is going to be the first thing to show up as a priority)

srynobio commented 9 years ago

I can't think of something that has not already been mentioned atm. I almost think we can't get away from doing 1, 2 and 3 (above chris) even if it's only 25-50% of our committed time. I'm a bit torn between assertions and genome connect because I like the idea of extending the model into more of the EHR type realm but I think the GC opportunity would give us the ability to show that the model holds up using clinical data.

In the end I would cast my vote as a yea for assertions, because it seem to be the most related to what we've done, and I think the group works well with an agreed upon goal.

tnavatar commented 9 years ago

I can see the interest in doing assertions next, not least because we've brushed against it in the past. One of the concerns I have is that it's really a moving target now. The general curation working groups are in the middle of their process, and they haven't (to my knowledge) made contact with the clinical domain working groups, which is likely to change the model and process even further.

There's every reason for us to be engaged with these processes, I'm just not sure whether we'll be able to produce a data model before their development work progresses further. It might be premature to focus the entire group on creating an assertion model right now.

cbizon commented 9 years ago

While the curation groups' plans are not yet final, I hope that you are overthinking the degree to which they are moving (though I have not been on the calls, so take this for what it is worth...)

If we started assertions, I think it would be entirely valid to treat the published ACMG guidelines as the primary use case, while keeping in mind the ideas that

A) Those guidelines are not the only ones that could be used in allele curation B) Assertions will also be on entities other than allele-pathogenicity

I'd be very surprised if the groups ended up very far from a model constructed in that way.

cbizon commented 9 years ago

One other point: on the way to assertions, we're actually going to need a (scaled back) individual/case.