ga4gh / ga4gh-schemas

Models and APIs for Genomic data. RETIRED 2018-01-24
http://ga4gh.org
Apache License 2.0
214 stars 110 forks source link

Implementation Q: support and harmonization of alias ontology terms #255

Open nmcabili opened 9 years ago

nmcabili commented 9 years ago

How does the team invasion to implement support for both an ontology ID and term (which can have several aliases , e.g. RNASeq, RNA-Seq, RNA-SEQ etc'); If these are unique ontology IDs for GA4GH , what process will harmonize alias terms and match the unique ID? If ontology IDs are external (e.g. GO, Disease Ontology ), which process will harmonize terms and IDs across external DBs?

I am not sure if my question is within scope...

Thx!

diekhans commented 9 years ago

The linking to ontologies is a very relevant. The metadata working is a great place to have this discussion. Please join us.

nmcabili notifications@github.com writes:

How does the team invasion to implement support for both an ontology ID and term (which can have several aliases , e.g. RNASeq, RNA-Seq, RNA-SEQ etc'); If these are unique ontology IDs for GA4GH , what process will harmonize alias terms and match the unique ID? If ontology IDs are external (e.g. GO, Disease Ontology ), which process will harmonize terms and IDs across external DBs?

I am not sure if my question is within scope...

Thx!

— Reply to this email directly or view it on GitHub.*

mellybelly commented 9 years ago

So I think that there is some confusion that perhaps I can clear up here. The ontology term (regardless of whether it is developed by GA4GH or is a community ontology) should have a URI that is used across resources, and the URI is associated with synonyms in a set of annotations properties (for example http://www.geneontology.org/formats/oboInOwl#hasExactSynonym) so there is no need to reconcile anything across sources as this is in fact the benefit of using an ontology class in that it gives you all those associations via the ID.

nmcabili commented 9 years ago

understood. So we do not assume that any service is going to do the matching of terms to the URI; the user or upstream process is responsible for that. What I am trying to envision is how would one support search queries (e.g. "find all RNA-Seq samples"): the user would not know the ID , so the DB that is storing information with the metadata described here should also be linked to another knowledge base that maps each URI to all the synonmous terms that match it (e.g ID123456 and {RNA-Seq, RnaSeq, RNA SEQ,...}) in order to support search queries. Just want to make sure I get this: for a specific record, the URI is unique , but the term can be any one of the synonymous terms. Is that right?

Thanks!

On Wed, Mar 25, 2015 at 1:20 PM, Melissa Haendel notifications@github.com wrote:

So I think that there is some confusion that perhaps I can clear up here. The ontology term (regardless of whether it is developed by GA4GH or is a community ontology) should have a URI that is used across resources, and the URI is associated with synonyms in a set of annotations properties (for example http://www.geneontology.org/formats/oboInOwl#hasExactSynonym) so there is no need to reconcile anything across sources as this is in fact the benefit of using an ontology class in that it gives you all those associations via the ID.

— Reply to this email directly or view it on GitHub https://github.com/ga4gh/schemas/issues/255#issuecomment-86130025.