ga4gh-beacon / specification-v2

GA4GH Beacon v2 specification.
Apache License 2.0
3 stars 6 forks source link

Define filtering_terms endpoint #30

Closed mbaudis closed 3 years ago

mbaudis commented 3 years ago

While the example implementation presents a filtering_terms endpoint https://beacon-giab-demo.ega-archive.org/api/filtering_terms there is no specification for this to be found. Also, the format looks wrong:

{
...
  "apiVersion": "v2.0.0-draft.2",
  "ontologyTerms": [
    {
      "ontology": "CL",
      "term": "0000236",
      "label": "B-lymphocyte"
    },
    {
      "ontology": "custom.pedigree.id",
      "term": "1",
      "label": "pedigree ID"
    },
...

There is IMO no problem w/ additional fields; instead of the ontology attribute one could just have the standard description, or a provenance, if needed.

An IMO better use is presented in our implementation:

{
  "apiVersion": "2.0.pre.2020-10-13",
  "datasetId": "progenetix",
  "filteringTerms": [
    {
      "count": 27,
      "id": "NCIT:C102872",
      "label": "Pharyngeal squamous cell carcinoma"
    },
    {
      "count": 166,
      "id": "NCIT:C105555",
      "label": "High Grade Ovarian Serous Adenocarcinoma"
    },
jrambla commented 3 years ago

+1 on using the GA4GH recommended approach. I don't see the counts having a place in the current context for filtering terms yet, but seems an interesting idea.

mbaudis commented 3 years ago

Counts are very informative & e.g. used by us then in the front end to calculate statistics over the returned items (Beacon response -> front end fetches biosamples from handover link -> calculates frequencies of observed / filter base count). Also shows counts per filter in the search form etc. So IMO good as optional parameter; but YMMV ...

sdelatorrep commented 3 years ago

What does the count count? The number of samples with this term? The number of individuals? If the term applies to both entities, what do we count?

mbaudis commented 3 years ago

@sdelatorrep Correct question, w/ several answers:

  1. In our case we map everything with a count to biosamples; additional filters are shown w/o counts
  2. We actually provide a separate count value for code_matches; i.e. count refers to the code + children, code_matches to the specific assignments in the proposed filter use á la @tb143 this would be THIS:code vs. THIS:code+ on the search level
  3. generally, this could be solved by scoping the filtering_terms endpoint, too; a general one, which has just lists of all filters, and then separate ones for biosamples etc. Or declare that the biosample (or individual) is the default scope.

Also: Declare filtering_terms per dataset?

mbaudis commented 3 years ago

I will close this & open as a new issue regarding the response format.