Open danielballan opened 2 years ago
Interesting arguments that JMESPath is the more robust choice: https://github.com/OAI/OpenAPI-Specification/discussions/2556#discussioncomment-678802
I think JMESPath is the clear winner. Closing, but open to arguments if anyone wants to revive this in the future.
Postgres implements JSONPath, and it looks like SQLite followed suit. If we want predicate push-down, that is the language we will need.
However, the points raised in the linked post above stand. JMESPath is much better specified. The question is whether to expose JMESPath in the HTTP API and convert or to expose JSONPath.
As of September it looks like JSONPath is on track to be formally specified. https://datatracker.ietf.org/wg/jsonpath/about/
The RFC is dormant, but there is nonetheless an actively maintained Python implementation of the proposal.
https://pypi.org/project/jsonpath-ng/
For use cases where we want to fish just a couple fields (e.g. some columns in a table of search results) out of some sprawling JSON metadata, we can get large performance wins from predicate push-down.
SQLite supports the basics: https://www.sqlite.org/json1.html#jex
And Postgres supports quite a lot, perhaps more than we would need to expose, at least at first: https://www.postgresql.org/docs/current/functions-json.html#FUNCTIONS-SQLJSON-PATH
Latest thinking on this, per chat with @pbeaucage:
select_metadata
and our support JMESPath, as JMESPath does too much: too complicated, and maybe exposes too much power to clients to put load on the serverselect
to /search
and /metadata
, accepting JSONPath. (Adding it to /search
is the important one, since the savings on a whole _page of results is the most significant.) The parameter should be typed List[str]
, as in ?select=start.uid&select=start.scan_id
. The returned metadata
should always be a flat JSON object, keyed like {"start.uid": ..., "start.scan_id": ...}
.
I learned about JSONPath when I noticed that Kubernetes uses it. It looks like an alternative to JMESPath. I am not sure if it is as formally specified or what other tradeoffs there might be, but its use by Kubernetes provides some basis to believe it will stick around. It would be worth understanding how they compare.