Open kevinwilfong opened 1 year ago
CC: @amitkdutta @zacw7 @aditi-pandit
There was an issue opened in simdjson recently for supporting JSONPath, this would get us most of the way there, I think Jayway just extended this with aggregates that can be tacked on the end of the path.
We will prioritize this feature in simdjson.
Update: This is more ambitious a proposal than I considered at first.
In our production workload, we are seeing the following JSONPaths that are currently not supported.
I believe we can support (1) and (2) with simple changes to JsonPathTokenizer and support (3) using something like "Tree Walking and JSON Element Types" in https://github.com/simdjson/simdjson/blob/master/doc/basics.md#json-path
See https://github.com/prestodb/presto/issues/22589 for additional context.
@rui-mo @PHILO-HE Folks, do you know if Spark also uses JayWay to implement json-extract functions?
@rui-mo @PHILO-HE Folks, do you know if Spark also uses JayWay to implement json-extract functions?
Hi @mbasmanova, Spark's implementation is based on Jackson. Here are some findings.
- "foo" - Jayway allows paths that do not start with a '$'. For these paths, it simply prepends the path with '$.' before compiling.
Spark requires json path starts with "$".
- "$.[0].foo" - While JSONPath allows either dot-notation or bracket-notation, Jayway allows a mix.
Spark allows a mix also.
@PHILO-HE Thank you for clarifying.
Description
The functions json_extract, json_extract_scalar, and json_size in Presto use the Jayway library to parse JSON paths and handle extraction https://github.com/json-path/JsonPath
Velox's version of these functions uses a JSON path tokenizer based on Presto's JsonPathTokenizer. JsonPathTokenizer supports a much simpler syntax, but is likely faster, Presto uses it if possible and falls back to Jayway's if it can parse the path.
Jayway's parser is quite extensive, supporting various operators, aggregates, regexes, filters.
It would be great if the Velox JSON path parser could support Jayway's syntax.
https://github.com/facebookincubator/velox/blob/e2ee0cad24d5407146d2da08b68c6701ee86e9da/velox/functions/prestosql/json/JsonPathTokenizer.h