ga4gh-beacon / beacon-v2

Unified repository for the GA4GH Beacon v2 API standard
Creative Commons Zero v1.0 Universal
24 stars 20 forks source link

requests/filteringTerms.yaml too forgiving for request validation #56

Open gsfk opened 1 year ago

gsfk commented 1 year ago

I've been using BeaconRequestBody.yaml (and its json cousin) for request validation, but the spec for filtering terms is too forgiving, and lets even the most obviously mangled filters through. The issue is with the use of anyOf, the overlaps between the different filter types, and a forgiving attitude toward additional properties.

For example, this filter passes, even though similarity is required to be a string that's one of "exact", "high", "medium", or "low":

  {
    "id": "some_id",
    "similarity": {"I snuck": "an object in here"}
  }

Some other examples are here: https://www.jsonschemavalidator.net/s/pFQY4Xno

This happens because:

The obvious sledgehammer solution is to forbid extra properties in filters, I don't know enough of the history here to know if that's too harsh of a solution. Is CustomFilter meant to be more open?

Here is a solution that forbids all extra properties in filters.

Here, if needed, is a more conservative solution that allows extra properties in CustomFilter but requires them to be objects, which should cut down on the amount of confusion between a CustomFilter extra property and a malformed field in another filter type, all of which are string or boolean.

mbaudis commented 1 year ago

The general problem with filters is that they are over-designed & under-engineered; but I'm not sure if it is necessary to solve by engineering vs. by documentation? IMO we can do some improvements (e.g. format check for ontology id's, your change of the additionalProperties use ...) but with "custom filters" opening the door wide it will always be ... complicated.

One possibility would be to restrict public filter use to some well-defined designs and verify against those... But overall I think there are many cases in Beacon where the solution won't come from "verify against JSON Schema".

jrambla commented 1 year ago

The approach of adding "additionalProperties: false" to the definitions seems fine to me. So I'll be in favor of adding that to them.

jrambla commented 1 year ago

However, I'm in doubt with the CustomFilter stuff. My guess is that it is there for historical reasons: we started just with ontology terms, nothing else. It was too restrictive as you are limited to terms defined in ontologies, which would be not applicable to custom lists of items... therefore we add the custom filter. Alphanumeric filters were added somewhat later... and they make sense, but I also believe that makes CustomFilters no longer necessary as we can express the example:

"demographic.ethnicity:asian"

as

  {
    "id": "demographic.ethnicity",
    "operator": "=",
    "value": "asian"
  }

Are we aligned?

mbaudis commented 1 year ago

I'm for the removal of custom filters from the public API, or just do them as placeholder object (i.e. only definition as object since they should not been though of something anybody needs to understand, and make a note about "internal use".

mbaudis commented 1 year ago

A solution for stricter typing is the use of filterType, e.g. filterType: alphanumeric as a required parameter. VRS is doing something similar then with if conditions (though it gets ... complex). @gsfk you want to look into that (or @tb143)?

gsfk commented 1 year ago

I had a quick peek at the VRS schema and didn't see any conditionals, although perhaps I'm looking in the wrong place.

Eliminating custom queries (or forcing them to be an object) both sound fine, although it only solves part of the problem, malformed OntologyFilters are no longer accepted, but bad AlphanumericFilters still get through.

without significantly complicating the schema, the easiest solutions I see are: