ga4gh / ga4gh-schemas

Models and APIs for Genomic data. RETIRED 2018-01-24
http://ga4gh.org
Apache License 2.0
214 stars 114 forks source link

Formalize or deprecate regex use in G2P #709

Open bwalsh opened 8 years ago

bwalsh commented 8 years ago

Continuation of discussion from here ... https://github.com/ga4gh/schemas/pull/701#discussion_r76309588

david4096 commented 8 years ago

Given the data interchange use cases can be satisfied without regex matching I believe it makes sense to remove it. The remainder of the API relies only on strict string matching to carry out its tasks.

The problem with this approach is that it undermines the work put into controlled vocabularies like OMIM or HPO. One should be able to use synonyms in an ontology to find similar items. If partial description searching is required to satisfy basic use cases than we need to address the efficacy of our ontological models.

Partial string matching is a feature many practical applications might provide, however, for data interchange I believe it may be out of scope.

mbaudis commented 8 years ago

As a general comment: While we want to have data formalised, there always will be information hidden in/forced into unstructured attributes; we consistently allow description & info which do not have controlled vocabularies. And don't get me about case sensitivity etc.