confluentinc / ksql

The database purpose-built for stream processing applications.
https://ksqldb.io
Other
5.88k stars 1.04k forks source link

KSQL stream is not working JSON data with nested fields #1203

Closed srinivasreddyv2 closed 6 years ago

srinivasreddyv2 commented 6 years ago

I have a JSON record like below.

{ "venue": { "venue_name": "HATCH", "lon": -71.18291, "lat": 42.36667, "venue_id": 22491322 }, "visibility": "public", "response": "yes", "guests": 0, "member": { "member_id": 237655942, "member_name": "Nts" }, "rsvp_id": 1724941595, "mtime": 1524620970613, "event": { "event_name": "Intro to Soldering", "event_id": "250106100", "time": 1526853600000, "event_url": "https://www.meetup.com/Makers-of-HATCH-Makerspace/events/250106100/" }, "group": { "group_topics": [ { "urlkey": "quilting", "topic_name": "Quilting" }, { "urlkey": "robotics", "topic_name": "Robotics" }, { "urlkey": "sewing", "topic_name": "Sewing" }, { "urlkey": "edtech", "topic_name": "Education & Technology" }, { "urlkey": "craftswap", "topic_name": "Crafts" }, { "urlkey": "diy", "topic_name": "DIY (Do It Yourself)" }, { "urlkey": "hacking", "topic_name": "Hacking" }, { "urlkey": "3d-modeling", "topic_name": "3D Modeling" }, { "urlkey": "tools", "topic_name": "Tools" }, { "urlkey": "arduino", "topic_name": "Arduino" }, { "urlkey": "makers", "topic_name": "Makers" }, { "urlkey": "makerspaces", "topic_name": "Makerspaces" }, { "urlkey": "3d-printing", "topic_name": "3D Printing" }, { "urlkey": "laser-cutting", "topic_name": "Laser Cutting" }, { "urlkey": "scrapbook-die-cutting-machines", "topic_name": "Scrapbook die cutting machines." } ], "group_city": "Watertown", "group_country": "us", "group_id": 18457932, "group_name": "Makers of HATCH Makerspace", "group_lon": -71.18, "group_urlname": "Makers-of-HATCH-Makerspace", "group_state": "MA", "group_lat": 42.37 } }

This data has been loaded into a Kafka topic.

I have created a stream in KSQL as below.

CREATE STREAM meetup_rsvp_raw ( Venue varchar, Visibility varchar, Response varchar, Guests integer, Member varchar, rsvp_id bigint, mtime bigint, event varchar, group_info varchar ) WITH (KAFKA_TOPIC='meetup-rsvp', VALUE_FORMAT='JSON');

I see null in group info field.

ksql> select * from meetup_rsvp_raw limit 5; 1524624181126 | null | {"venue_name":"Houghton's Pond - Blue Hills","lon":-71.09453,"lat":42.208187,"venue_id":1506300} | public | yes | 0 | {"member_id":159617162,"photo":"https://secure.meetupstatic.com/photos/member/7/2/b/c/thumb_215729372.jpeg","member_name":"Tena Kerns"} | 1724949934 | 1524623875376 | {"event_name":"Blue Hills Buck Hill - Easy Pace / Moderate hike","event_id":"250084062","time":1525010400000,"event_url":"https://www.meetup.com/HikeBikeSocialClub/events/250084062/"} | null 1524624181126 | null | {"venue_name":"Community Wholeness Centre CWC","lon":-79.69191,"lat":44.38976,"venue_id":19966962} | public | no | 0 | {"member_id":222279178,"photo":"https://secure.meetupstatic.com/photos/member/d/3/f/c/thumb_273714268.jpeg","member_name":"Natalie Roy"} | 1724949935 | 1524623875430 | {"event_name":"Karate Class - Ken Shin Budo Kai","event_id":"kbsjtmyxgbnc","time":1525129200000,"event_url":"https://www.meetup.com/CWCBarrie/events/250120204/"} | null 1524624181126 | null | null | public | no | 0 | {"member_id":182695992,"photo":"https://secure.meetupstatic.com/photos/member/5/9/d/7/thumb_258742999.jpeg","member_name":"Tank"} | 1724775217 | 1524623875653 | {"event_name":"Poconos Weekend - Train Ride, Biking, ATVs, Skeet Shooting, & BBQ @lovely house!","event_id":"249488631","time":1524870000000,"event_url":"https://www.meetup.com/The-New-York-Cantonese-Club-NYHKCC/events/249488631/"} | null 1524624181126 | null | {"venue_name":"WeWork Penn Station","lon":-73.993378,"lat":40.754013,"venue_id":25715869} | public | yes | 0 | {"member_id":195645524,"photo":"https://secure.meetupstatic.com/photos/member/4/b/1/b/thumb_270679227.jpeg","member_name":"Pooja Saini"} | 1724950473 | 1524624172000 | {"event_name":"Mind-controlling the Web with Brain-Computer Interfaces","event_id":"249000586","time":1526508000000,"event_url":"https://www.meetup.com/cloudnyc/events/249000586/"} | null 1524624181126 | null | {"venue_name":"European Health Care Centre/ Dr. Karen Hudes's Office","lon":-79.419518,"lat":43.759827,"venue_id":25719451} | public | yes | 0 | {"member_id":235028400,"photo":"https://secure.meetupstatic.com/photos/member/6/3/f/f/thumb_270685599.jpeg","member_name":"Brenda"} | 1724950487 | 1524624179006 | {"event_name":"Books and Bevs","event_id":"rnmgnpyxmbhc","time":1537916400000,"event_url":"https://www.meetup.com/meetup-group-wwitSSIX/events/250121610/"} | null

any solution?

rodesai commented 6 years ago

I don't see a group_info field in the json you pasted.

srinivasreddyv2 commented 6 years ago

The field name is "group" in the json but Kafka does not allow me to name any field with name "group" as it is a keyword. So I named it as group_info.

srinivasreddyv2 commented 6 years ago

Finally I was able to create the field "group" with same name in KSQL by enclosing the name in quotes and uppercase.

richabanker commented 5 years ago

How about if I have a nested field that's a keyword, for ex: a json object like: {"host":{"string":"myhost1"}}. Here "string" is a keyword and enclosing it in quotes and uppercase doesn't seem to work.

mikwin1 commented 5 years ago

I ran into a similar situation and attempted a work around by creating another stream from select only that one field (in your case the "group" field"). Basically, creating a new stream to essentially "drill down" the nested fields. Not really happy with this work around...research a better way. -Mike