Open colinveal opened 6 years ago
great, thanks for sharing this @colinveal !
@Relequestual and @fschiettecatte, we reached consensus on this yesterday right? How about we ask Colin to add this directly (or Colin and Ben meet 1-1 teleconf to do that jointly) to draft to keep things moving as we are pretty close to v0.1.0 freeze?
Fine with me.
Thanks for these @colinveal It looks like we've reached an updated consensus at https://github.com/ga4gh-discovery/ga4gh-discovery-search/issues/9#issuecomment-408702362
Are you happy to close this issue in favour of https://github.com/ga4gh-discovery/ga4gh-discovery-search/issues/9 ?
The gists should stay around for reference.
Looking at these updated examples, it looks like this issue is still discussing if we need hierarchical components, and not how the components are referenced (which we have agreed on now I believe).
Could you explain in sudo logic the query you're expecting please?
Following from my previous experience with the MME API specification, and how we and others represent variants in our database, subjectvariant
from your example of would encompass allele, zygosity, and pathogenicity, but not phenotype
as you have shown in your first model example.
So, in terms of a subjectvariant
component, we're pretty close to agreeing, I think.
I would say gender
comes under a subject
component, as it's information about the subject, which would probably include some identifier too. Still needs to be ironed out.
disease
, I don't think we intend to support free text at the API layer. OMIM codes only for now, although given OMIM isn't a disease ontology, having text based could be useful, but equally you could provide that ability by including all the terms which include the specific phrase your user inputs.
For phenotype
, I feel for now we should specify, matches up or down the tree, apart from the second level generic terms (as in, not children of HP:0000118). HPO only for now.
In "model 2", data is split into components in a too granular way so as they loose meaning, loosing context. For example value
or allele
on their own should not be a component. Components should have meaning on their own.
To summarise the previous, I want to combine fields into components which represent different contexts, removing ambiguity. There may be components which have similar fields, but have different contextual meaning. I find that a preferable solution over nesting components.
I'm even unhappy about the potential to assign different meanings to components based on how they are combined, which seems to be the suggestion you're putting forward here.
The query is: A subject that has (heterozygous allele 'A' at variant rs123 where variant rs123 is pathogenic for dementia and variant rs123 has a relationship with MMSE AND the subject has dementia or Alzheimer's disease) OR (the subject has homozygous allele 'C' at variant rs124 AND the subject has Alzheimer's disease or MMSE > 20.
I can see how we can use fields to replicate a lot of the hierarchy within a component, however to replicate the complexity available using hierarchies there could be a lot of fields for some components. Also where there are qualifiers that are required for multiple fields, i.e 'ontology', 'operator', 'value', 'source', 'unit' then these would require distinct naming to distinguish which field they apply to, thereby also increasing the overall number of fields in a component.
I feel there should be a way that we can take advantage of both ways
Here's the examples from the teleconference: Example 1 Example 2
I believe we were leaning towards the hierarchical model with external logic:
Model 1