ga4gh-discovery / ga4gh-case-discovery

A framework for searching genomic data sharing services
Apache License 2.0
8 stars 5 forks source link

Example Query Model Structures for discussion #32

Open colinveal opened 6 years ago

colinveal commented 6 years ago

Here's the examples from the teleconference: Example 1 Example 2

I believe we were leaning towards the hierarchical model with external logic:

Model 1

harindra-a commented 6 years ago

great, thanks for sharing this @colinveal !

@Relequestual and @fschiettecatte, we reached consensus on this yesterday right? How about we ask Colin to add this directly (or Colin and Ben meet 1-1 teleconf to do that jointly) to draft to keep things moving as we are pretty close to v0.1.0 freeze?

fschiettecatte commented 6 years ago

Fine with me.

Relequestual commented 6 years ago

Thanks for these @colinveal It looks like we've reached an updated consensus at https://github.com/ga4gh-discovery/ga4gh-discovery-search/issues/9#issuecomment-408702362

Are you happy to close this issue in favour of https://github.com/ga4gh-discovery/ga4gh-discovery-search/issues/9 ?

The gists should stay around for reference.

colinveal commented 6 years ago

Hi, I've updated models 1 (hierarchical) and 2 (non-hierarchical) with json pointers. model 1 model 2

Relequestual commented 6 years ago

Looking at these updated examples, it looks like this issue is still discussing if we need hierarchical components, and not how the components are referenced (which we have agreed on now I believe).

Could you explain in sudo logic the query you're expecting please?


Following from my previous experience with the MME API specification, and how we and others represent variants in our database, subjectvariant from your example of would encompass allele, zygosity, and pathogenicity, but not phenotype as you have shown in your first model example.

So, in terms of a subjectvariant component, we're pretty close to agreeing, I think.

I would say gender comes under a subject component, as it's information about the subject, which would probably include some identifier too. Still needs to be ironed out.

disease, I don't think we intend to support free text at the API layer. OMIM codes only for now, although given OMIM isn't a disease ontology, having text based could be useful, but equally you could provide that ability by including all the terms which include the specific phrase your user inputs.

For phenotype, I feel for now we should specify, matches up or down the tree, apart from the second level generic terms (as in, not children of HP:0000118). HPO only for now.

In "model 2", data is split into components in a too granular way so as they loose meaning, loosing context. For example value or allele on their own should not be a component. Components should have meaning on their own.

Relequestual commented 6 years ago

To summarise the previous, I want to combine fields into components which represent different contexts, removing ambiguity. There may be components which have similar fields, but have different contextual meaning. I find that a preferable solution over nesting components.

I'm even unhappy about the potential to assign different meanings to components based on how they are combined, which seems to be the suggestion you're putting forward here.

colinveal commented 6 years ago

The query is: A subject that has (heterozygous allele 'A' at variant rs123 where variant rs123 is pathogenic for dementia and variant rs123 has a relationship with MMSE AND the subject has dementia or Alzheimer's disease) OR (the subject has homozygous allele 'C' at variant rs124 AND the subject has Alzheimer's disease or MMSE > 20.

I can see how we can use fields to replicate a lot of the hierarchy within a component, however to replicate the complexity available using hierarchies there could be a lot of fields for some components. Also where there are qualifiers that are required for multiple fields, i.e 'ontology', 'operator', 'value', 'source', 'unit' then these would require distinct naming to distinguish which field they apply to, thereby also increasing the overall number of fields in a component.

I feel there should be a way that we can take advantage of both ways