Open sarahhunt opened 9 years ago
Some sample data from CIViC that we could use to see how it fits with current G2P schema: The main endpoints for CIViC API are (1) gene (with reference to one or more variant), (2) variant (with reference to one or more evidence statement), and (3) evidence items. Here is an example gene, with one of its variants, and one of that variants evidence statements. https://civic.genome.wustl.edu/api/genes/5 https://civic.genome.wustl.edu/api/variants/12 https://civic.genome.wustl.edu/api/evidence_items/79
For more details on the evidence item fields go to: https://civic.genome.wustl.edu/#/help
Select "Evidence Items" from the left. Navigate the four tabs to learn about the evidence items, types, levels, and ratings.
Pasted below is the output from a rough prototype G2P endpoint serving GWAS catalog data. The evidence attributes look odd as the field where the statistics are returned is called "description". "7.00e-19" is a funny sounding description for a p-value.
Also, there is nowhere obvious to put the risk allele - it isn't evidence for the association, more a result/info value at the FeaturePhenotypeAssociation level.
{ "associations": [ { "id": "placeholder", "features": [ { "featureType": { "source": "SO", "name": "sequence_variant", "id": "SO:0001060" }, "referenceName": "10", "end": 61992400, "featureSetId": "placeholder", "parentIds": [], "id": "rs7089424", "attributes": {}, "start": 61992400 } ], "phenotype": { "qualifier": "", "ageOfOnset": "", "type": { "ontologySourceID": "http://www.ebi.ac.uk/efo/EFO_0000220", "ontologySourceName": "EFO", "ontologySourceVersion": "" }, "id": "placeholder", "description": "Acute lymphoblastic leukemia (childhood)" }, "evidence": [ { "evidenceType": "http://purl.obolibrary.org/obo/IAO_0000311", "description": "PMID:19684604" }, { "evidenceType": "http://purl.obolibrary.org/obo/OBI_0001442", "description": "7.00e-19" }, { "evidenceType": "http://purl.obolibrary.org/obo/OBCS_0000054", "description": "1.65" }, { "evidenceType": "risk/associated allele to go somewhere else", "description": "C" } ], "environmentalContexts": [], },
We'll need to get to a way of representing different type of defined evidence types (specific to cancer clinical genomics, statistical associations, etc.). Use of subclassing would be a better solution but a conceptual straw man would be something like:
record FeaturePhenotypeAssociation { … union {CancerEvidenceSet, StatisticalEvidenceSet} evidence; … }
record CancerEvidenceSet { OntologyTerm evidenceType; OntologyTerm evidenceLevel; OntologyTerm evidenceDirection; … }
record StatisticalEvidenceSet { OntologyTerm pValue; … }
The evidence record currently only supports an evidence type and text description. Are there advanced plans to extend this? In other parts of the API a metadata key-value pair structure is used to allow customisation in different implementations. The idea is that the API has some flexibility from the start and commonly used data types can be promoted to named fields as required.