ga4gh-beacon / specification-v2

GA4GH Beacon v2 specification.
Apache License 2.0
3 stars 6 forks source link

Add /filtering_terms endpoint #13

Closed sdelatorrep closed 3 years ago

sdelatorrep commented 4 years ago

This endpoint should describe what ontology terms or custom dictionaries are implemented in this Beacon. Something similar to:

{
  "id": "demo-beacon",
  "name": "Demo Beacon",
  "apiVersion": "2.0.0",
  "ontologyTerms": [
    {
      "ontology": "GO",
      "term": "0030237",
      "label": "Female"
    },
    {
      "ontology": "GO",
      "term": "0030238",
      "label": "Male"
    },
    {
      "ontology": "HPO",
      "term": "0009726",
      "label": "Kidney cancer"
    },
    {
      "ontology": "HPO",
      "term": "0100526",
      "label": "Lung cancer"
    },
    {
      "ontology": "HPO",
      "term": "0012115",
      "label": "Hepatitis"
    }
  ]
}
mbaudis commented 4 years ago

@sdelatorrep @jrambla @silverdaz Generally agree, but some comments on the format:

Format

  1. there is the generally accepted style to use prefixed terms (class) as id and text as label
  2. since custom filters (or "esoteric" classes) may not be immediately obvious to the user, a source/description may be in order
  3. optional parameter: count, to deliver some information about the number of potentially matched samples...
  4. please use filteringTerms or filters ... instead ontologyTerms, since custom filters etc. go beyond ontology classes.

I'm modelling this now for Beacon+ - examples (the custom icdom filter is not prefixed but internally structured):

{
    "id": "NCIT:C126594",
    "label": "Follicular Variant Thyroid Gland Papillary Carcinoma",
    "source": "National Cancer Institute Thesaurus",
    "count": 112
},
{
    "id": "icdom-85003",
    "label": "Infiltrating duct carcinoma, NOS",
    "source": "ICD-O 3 Morphologies (Progenetix)",
    "count": 9754,
  },

Delivery / endpoint(s)

The endpoint should allow for definition of

  1. the dataset for which filters are being looked up
    • This is important since there may be large differences in what can be queried; also lists can get long ...which leads to
  2. filters should be optionally filtered by prefix(es); if I look in Progenetix there are 1142 "bio" filters and 4414 external identifiers (e.g. PMID) - and this doesn't include customs like age groups etc.
mbaudis commented 4 years ago

Live examples (can be modified upon feedback):

Here, only in the last example sample counts per filter are being provided (dataset specific).

WDYT, @sdelatorrep ?

sdelatorrep commented 3 years ago

Closing as it is duplicated by #30.