Closed yashmayya closed 2 weeks ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 62.16%. Comparing base (
59551e4
) to head (56f0747
). Report is 413 commits behind head on master.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
I don't fully follow the example in the description. If the value is already JSON string, user shouldn't call JSON_FORMAT
again
I'm not sure if this is the correct behavior. JSON_FORMAT should be able to convert any object into JSON. If the value itself is abc, then the proper JSON version should be "abc".
That makes sense, I hadn't considered that.
If user wants to generate another level of json over a JSON string, we should still allow that, and the new JSON is also valid
Are there any valid use cases for that?
I don't fully follow the example in the description. If the value is already JSON string, user shouldn't call JSON_FORMAT again
Yeah, I agree in principle but this leads to the confusing behavior that I tried to document in the PR description. Let me try again below.
data
(JSON) and alias
(STRING). alias
is a derived column using the following ingestion transform function: JSON_PATH_STRING(JSON_FORMAT(data), '$.alias')
.{"data":{"alias":"student1","age":24,"name":{"full.name":"Peter1","nick.name":"Pete1"}}}
.alias
column being set to student1
.JSON_FORMAT
function is called on the Map
value representing data
during ingestion, which results in the JSON string and that works fine with the JSON_PATH_STRING
function.data
(JSON). A sample value that has already been ingested is {"data":{"alias":"student1","age":24,"name":{"full.name":"Peter1","nick.name":"Pete1"}}}
.alias
with the same logic as the above scenario. Now, the user would expect that the same ingestion transform function should work here as well with segment reload.JSON_PATH_STRING(JSON_FORMAT(data), '$.alias')
(and also updates the schema to add the new alias
column), the segment reload fails with an NPE here.JSON_FORMAT
function in this case is a String (rather than a Map
as in the above scenario) since JSON columns are stored as strings internally. This results in an escaped string value that is treated as a literal JSON string value by the JSON_PATH_STRING
function thus resulting in an unexpected null
value being returned.Is this working as expected and documented somewhere? Or should we solve this issue in a different way to what this PR was attempting?
In the given example, the problem is actually from the recursive call of data = JSON_FORMAT(data)
. I don't think Pinot allows this (maybe I was wrong).
It should work as expected if we configure the ingestion transform to be:
data_json = JSON_FORMAT(data)
alias = JSON_PATH_STRING(data_json, '$,alias')
{"data":{"alias":"student1","age":24,"name":{"full.name":"Peter1","nick.name":"Pete1"}}}
. Calling theJSON_FORMAT
function on this will result in the string value{"data":{"alias":"student1","age":24,"name":{"full.name":"Peter1","nick.name":"Pete1"}}}
.JSON_FORMAT
function again on this string value will result in the escaped string -"{\"data\":{\"alias\":\"student1\",\"age\":24,\"name\":{\"full.name\":\"Peter1\",\"nick.name\":\"Pete1\"}}}"
. Passing it through the function again will result in a doubly escaped string -"\"{\\\"data\\\":{\\\"alias\\\":\\\"student1\\\",\\\"age\\\":24,\\\"name\\\":{\\\"full.name\\\":\\\"Peter1\\\",\\\"nick.name\\\":\\\"Pete1\\\"}}}\""
.data
JSON column with the row -{"data":{"alias":"student1","age":24,"name":{"full.name":"Peter1","nick.name":"Pete1"}}}
.alias
using the ingestion transform functionJSON_PATH_STRING(JSON_FORMAT(data), '$.alias')
(via segment reload). Since JSON columns are stored as strings internally, callingJSON_FORMAT
on its values will lead to escaped strings. This leads to an NPE here because theJSON_PATH_STRING
function returns anull
as it will treat the escaped string argument as a literal JSON string value. This minor patch fixes this issue by avoiding re-stringifying strings inJsonUtils::objectToString
.JSON_FORMAT
function in this case would be the rawHashMap
that hasn't been converted to a string yet.bugfix