ga4gh / g2p-team

GitHub Repo for the Genotype to Phenotype Task Team
Apache License 2.0
3 stars 0 forks source link

Schema Change Discussion #17

Open bwalsh opened 8 years ago

bwalsh commented 8 years ago

Team:

The schemas have received feedback that the semantics of SearchGenotypePhenotypeRequest are very unclear. In this section of the api documentation that applies to our schema, I've added some examples and guidance.
We are proposing deprecating the un-scoped string that is used in the query and replacing it with a scoped TermQuery. We hope to introduce this, along with adding a placeholder for external identifiers in Evidence and PhenotypeInstance along with our current pull request. Your comments are invaluable. readme

In addition, we have also been asked to consider a PhenotypeAssociation which has a wider scope; it connects evidence to entities other than Feature. Here we propose a new entrypoint that follows the modified pattern of the G2P and adds phenotype/search. This allows for discover of evidence associated with (Variant,FeatureEvent,BioSample,Individual,CallSet). Again, your comments will be useful. readme

diekhans commented 8 years ago

Very useful Brian.

Some comments:

Mark

Brian notifications@github.com writes:

Team:

The schemas have received feedback that the semantics of SearchGenotypePhenotypeRequest are very unclear. In this section of the api documentation that applies to our schema, I've added some examples and guidance.

We are proposing deprecating the un-scoped string that is used in the query and replacing it with a scoped TermQuery. We hope to introduce this, along with adding a placeholder for external identifiers in Evidence and PhenotypeInstance along with our current pull request. Your comments are invaluable. readme

In addition, we have also been asked to consider a PhenotypeAssociation which has a wider scope; it connects evidence to entities other than Feature. Here we propose a new entrypoint that follows the modified pattern of the G2P and adds phenotype/search. This allows for discover of evidence associated with (Variant,FeatureEvent,BioSample,Individual,CallSet). Again, your comments will be useful. readme

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub*

bwalsh commented 8 years ago

Concerns about current API

If I understand this correctly, I think we should be concerned about clashing of unscoped identifiers. For example, I read this as supporting something like { 'phenotype': ['FH'] }, in which I think it's unclear whether that's FH the gene (via "ExternalIdentifierQuery") or Familial Hypercholesterolemia (via "PhenotypeQuery"). Is that (or something like it) a valid concern here?
https://github.com/ga4gh/schemas/pull/432#issuecomment-189512499
The semantics of SearchGenotypePhenotypeRequest are very unclear. I would really have no idea how to construct a query.
https://github.com/ga4gh/schemas/pull/432#discussion_r54935254

Proposed schema changes

record EvidenceQuery {
  /**
  only those fields from evidence that are `queryable`
  */
  union { null, OntologyTerm }  evidenceType;
  union { null, string } description = null;  /*regex*/
  union { null, array<org.ga4gh.models.ExternalIdentifier> }  externalIdentifiers = null; /* new field */
}

record FeatureQuery {
  /**
  only those fields from feature that are `queryable`
  */
  union { null, string } name; /* new field, regex */
  union { null, string } description; /* new field,regex */
  union { null, string } featureSetId;
  union { null, string } referenceName;
  union { null, long } start = 0;
  union { null, long } end;
  union { null, Strand } strand;
  union { null, OntologyTerm } type; /* new field  */
  union { null, OntologyTerm } featureType;
  union { null, array<org.ga4gh.models.ExternalIdentifier> } externalIdentifiers = null; /* new field */
}

record PhenotypeQuery {
  /**
  only those fields from phenotype that are `queryable`
  */

  union { null, OntologyTerm } type;
  union { null, array<OntologyTerm> } qualifier = null;
  union { null, OntologyTerm } ageOfOnset = null;
  union { null, string } description = null;  /*regex*/
  union { null, array<org.ga4gh.models.ExternalIdentifier> }  externalIdentifiers = null;  /* new field */
}

regex: The regular expression language is defined in XQuery 1.0 and XPath 2.0 Functions and Operators section 7.6.1 Regular Expression Syntax.

New entry points

One criticism of the current API is that it is overloaded, it violates a design goal of separation of concerns. Specifically it combines the search for evidence with search for features & search for genotypes

This proposal move search,alias matching and external identifiers lookup to dedicated end points.

POST phenotypes/search PhenotypeQuery

POST features/search FeatureQuery

Changes to existing API

The SearchGenotypePhenotype search is simplified. Features and Phenotypes are expressed as a simple array of string identifiers . Evidence can be queried via the new EvidenceQuery.

record SearchGenotypePhenotypeRequest {

  ...

  union {null, array<string> } featureIds = null;

  union {null, array<string> } phenotypeIds = null;

  union {null, array<EvidenceQuery> } evidence = null;

  ...

}

Multiple server collation - Background

G2P servers are implemented in three different contexts:

image

image

image

Flexible representation of Feature

Convenience endpoints

show associations for this feature

POST feature/[id]/associations FeatureAssociationQuery

show associations for this phenotype

POST phenotype/[id]/associations PhenotypeAssociationQuery

Future direction

Consider instead a PhenotypeAssociation which has a wider scope; the objects it connects and the evidence type determines the meaning of the association

image

POST [EntityName]/[id]/associations [EntityName]AssociationQuery

Implementation Guidance: Queries (New API)

Id Searches: Feature Lookup

| Q: I have a featureId ("f12345"). | Create a SearchGenotypePhenotypeRequest | {… "featureIds" : ["f12345"] … } | The system will respond with evidence for features that match on that identifier

| Q: I only want somatic variant features SO:0001777 how do I limit results? | Create a FeatureQuery, specify featureType | POST to feature/search | The client then would use those feature.id to construct a SearchGenotypePhenotypeRequest | The system will respond with features that match on that type

| Q: I have a SNPid ("rs6920220"). | Create a FeatureQuery.ids | POST to feature/search | The system will respond with features that match on external identifier. | The client then would use those feature.id to construct a SearchGenotypePhenotypeRequest | Dependency: external_ids to be added to Feature.ids

| Q: I have an identifier for BRCA1 GO:0070531 how do I query for feature? | Create a FeatureQuery.type | POST to feature/search | The system will respond with features that match on ontology term. | The client then would use those feature.id to construct a SearchGenotypePhenotypeRequest | Dependency: ontologies to be added to Feature.type

Id Searches: Phenotype Lookup

| Q: I have a phenotype id (“p12345”) | Create a SearchGenotypePhenotypeRequest | {..., "phenotypeIds": ["p12345"],...} | The system will respond with evidence that match on PhenotypeInstance.id

| Q: I have a Disease ontology id ("http://www.ebi.ac.uk/efo/EFO_0003767"). | POST PhenotypeQuery.type to phenotype/search | The system will respond with phenotypes that match on OntologyTerm.id | The client then would use those phenotype.id to construct a SearchGenotypePhenotypeRequest

| Q: I have an ontology term for a phenotype (HP:0001507, 'Growth abnormality' ), how do I query it? | POST PhenotypeQuery.qualifier to phenotype/search | The system will respond with phenotypes that match on OntologyTerm.id | The client then would use those phenotype.id to construct a SearchGenotypePhenotypeRequest

| Q: I am only interested in phenotypes qualified with (PATO_0001899, decreased circumference ) | POST PhenotypeQuery.qualifier to phenotype/search | The system will respond with phenotypes whose qualifiers that match that ontology 'is_a' | The client then would use those phenotype.id to construct a SearchGenotypePhenotypeRequest

| Q: I am only interested in phenotypes with ageOfOnset of (HP:0003581, adult onset ) | POST PhenotypeQuery.ageOfOnset to phenotype/search | The system will respond with phenotypes whose ageOfOnset that match | The client then would use those phenotype.id to construct a SearchGenotypePhenotypeRequest

sarahhunt commented 8 years ago

@bwalsh - I like the proposed simplification of the SearchGenotypePhenotypeRequest to accept phenotype ids and feature ids.

The SeqAnn schema already has a features/search, so a new endpoint like FeatureQuery is not required. Adding ExternalIdentifiers to Feature and a Feature GET by ExternalIdentifier endpoint makes sense. This requirement has already been raised: https://github.com/ga4gh/schemas/issues/578.