Closed kenXengineering closed 6 years ago
Hmm. It looks like the nested JSON string has the quotes removed from the field name, causing a json parse exception. Does this work on KSQL 4.1. If you must use docker, could you try reproducing with cp-ksql-server:5.0.0-beta30?
Thanks, Apurva
@apurvam Just tried it with the Docker image cp-ksql-server:5.0.0-beta30 and it is working.
ksql> select TitleData, EXTRACTJSONFIELD(TitleData, '$.TitleId') from ingestion_stream limit 1;
{"TitleId":"db5d229d-19b5-49c6-a0b3-80fa19caa054","LookupKey":"CENT","Machine":"CHICAGO","Workspace":"TGD21","Active":true,"Collect":true,"Tupelo":0,"IpSuppression":""} | db5d229d-19b5-49c6-a0b3-80fa19caa054
I used the same create stream
query as before. Looks like the current version of ksql is dropping the quote character when parsing the json message into a varchar column.
cc @hjafarpour
This might be an issue with the new json converter.
I also wanted to note that using STRUCT
worked on the latest beta version. I downloaded the June preview release and redid my streams using STRUCT
this time, and had no issue. Was able to do some awesome things with KSQL, really liking it so far!
@chosenken Now that we have support for STRUCT
you should use struct instead of EXTRACTJSONFIELD
. Our new Serde now reads the nested JSON fields as struct and when it's converted into string it removes the double quotes from the field names which results in parsing error in EXTRACTJSONFIELD
. You can now only use EXTRACTJSONFIELD
if the fields is indeed string and the content of the string is a valid JSON.
Do we have docs which explain when to use EXTRACTJSONFIELD
vs when to use STRUCT
?
Might be worth linking to here and then closing this out.
I am parking this here for Future-Me. I am using the KSQL-CLI in a similar situation. You can see the details in my StackOverflow question. The symptom of this in the CLI is the return of a null column value:
ksql> SELECT EXTRACTJSONFIELD(obj, '$.owner.org') AS org FROM objs1;
null
@hjafarpour I think this degrades the usefulness of KSQL for a use-case where the JSON format of each message is not consistent. I know this limitation exists in other parts of the Confluent suite (Kafka Connect, for instance, hates this inconsistency).
In my case the deep structures of the JSON messages will be different from message to message. I am looking to use the field lookup function to cherry-pick messages that do contain a specific field. I do not see a way to accomplish this with STRUCT in a straight-forward way because the CREATE STREAM would need to layout every possible field name.
STRUCT also becomes a bit onerous for deeply nested messages because of the need for nesting STRUCTs in the CREATE STREAM syntax. Am I missing an approach?
@tpanagos I agree that the use cases you mentioned are valid and we should be able to handle them in KSQL. SO here is a fix that hopefully will make it easier to use KSQL: #1962
I am having an issue getting fields out of a JSON Object that is stored as a VARCHAR column in my stream. When I use the EXTRACTJSONFIELD UDF on the column, I get the following Java Expections:
I created the stream as follows:
create stream ingestion_stream (v varchar, m varchar, sync varchar, DateStamp varchar, AccessKey varchar, TitleData varchar, ErrorMessage varchar, ErrorStack varchar, IsBot varchar, DateCreated varchar) with (KAFKA_TOPIC='api_ingestion', VALUE_FORMAT='json');
The
TitleData
field is a JSON Object. Example Data:Output from a query using EXTRACTJSONFIELD:
And for good measure, a describe on the stream:
I am running KSQL via Docker using the
confluentinc/cp-ksql-server:5.0.0-beta180702222458
image. Running against a Kafak 1.1.0 cluster.Please let me know if you need anything else, thanks!