Open gsfk opened 1 year ago
The general problem with filters is that they are over-designed & under-engineered; but I'm not sure if it is necessary to solve by engineering vs. by documentation? IMO we can do some improvements (e.g. format check for ontology id's, your change of the additionalProperties use ...) but with "custom filters" opening the door wide it will always be ... complicated.
One possibility would be to restrict public filter use to some well-defined designs and verify against those... But overall I think there are many cases in Beacon where the solution won't come from "verify against JSON Schema".
The approach of adding "additionalProperties: false" to the definitions seems fine to me. So I'll be in favor of adding that to them.
However, I'm in doubt with the CustomFilter stuff. My guess is that it is there for historical reasons: we started just with ontology terms, nothing else. It was too restrictive as you are limited to terms defined in ontologies, which would be not applicable to custom lists of items... therefore we add the custom filter. Alphanumeric filters were added somewhat later... and they make sense, but I also believe that makes CustomFilters no longer necessary as we can express the example:
"demographic.ethnicity:asian"
as
{
"id": "demographic.ethnicity",
"operator": "=",
"value": "asian"
}
Are we aligned?
I'm for the removal of custom filters from the public API, or just do them as placeholder object (i.e. only definition as object
since they should not been though of something anybody needs to understand, and make a note about "internal use".
A solution for stricter typing is the use of filterType
, e.g. filterType: alphanumeric
as a required parameter. VRS is doing something similar then with if
conditions (though it gets ... complex). @gsfk you want to look into that (or @tb143)?
I had a quick peek at the VRS schema and didn't see any conditionals, although perhaps I'm looking in the wrong place.
Eliminating custom queries (or forcing them to be an object) both sound fine, although it only solves part of the problem, malformed OntologyFilters are no longer accepted, but bad AlphanumericFilters still get through.
without significantly complicating the schema, the easiest solutions I see are:
filterType
field, or simply vary the name of the id
property across definitions
I've been using
BeaconRequestBody.yaml
(and its json cousin) for request validation, but the spec for filtering terms is too forgiving, and lets even the most obviously mangled filters through. The issue is with the use ofanyOf
, the overlaps between the different filter types, and a forgiving attitude toward additional properties.For example, this filter passes, even though similarity is required to be a string that's one of "exact", "high", "medium", or "low":
Some other examples are here: https://www.jsonschemavalidator.net/s/pFQY4Xno
This happens because:
OntologyFilter
,AlphanumericFilter
, orCustomFilter
,AlphanumericFilter
can often pass asOntologyFilter
CustomFilter
, any malformed values other than "id" or "scope" are considered extra properties with no restrictions on them.The obvious sledgehammer solution is to forbid extra properties in filters, I don't know enough of the history here to know if that's too harsh of a solution. Is CustomFilter meant to be more open?
Here is a solution that forbids all extra properties in filters.
Here, if needed, is a more conservative solution that allows extra properties in CustomFilter but requires them to be objects, which should cut down on the amount of confusion between a CustomFilter extra property and a malformed field in another filter type, all of which are string or boolean.