Avro struct support doesn't work with twitter schema #1536

rmoff commented 6 years ago

KSQL 5.0.0-SNAPSHOT (build 50)

With a view to updating this blog to use the new nested Avro support, I tried this out but it fails:

ksql> create stream twitter with (kafka_topic='twitter_avro_01',value_format='avro');
 Unable to verify the AVRO schema is compatible with KSQL. Map key must be of type STRING

Kafka Connect as the source of data, using @jcustenborder's twitter source.

Kafka Connect config:

  "name": "twitter_source_avro_01",
  "config": {
        "key.converter": "io.confluent.connect.avro.AvroConverter",
        "key.converter.schema.registry.url": "http://localhost:8081",
        "value.converter": "io.confluent.connect.avro.AvroConverter",
        "value.converter.schema.registry.url": "http://localhost:8081",
    "kafka.delete.topic": "twitter_deletes_avro_01",
    "twitter.oauth.consumerKey": "XXXX",
    "twitter.oauth.consumerSecret": "XXXX",
    "twitter.oauth.accessToken": "XXXX",
    "twitter.oauth.accessTokenSecret": "XXXX",
    "kafka.status.topic": "twitter_avro_01",
    "connector.class": "com.github.jcustenborder.kafka.connect.twitter.TwitterSourceConnector",
    "process.deletes": true,
    "filter.keywords": "never,gonna,give,you,up"

Sample message:

ksql> print 'twitter_avro_01';
05/07/18 11:12:51 BST, ��������, {"CreatedAt": 1530785571000, "Id": 1014814356629086208, "Text": "@cyberomin Did Amazon start AWS because they had excess infra, or did they build the extra infra to accommodate their AWS plan?\n\nAWS wasn't a happenstance, it was well planned.", "Source": "<a href=\"\" rel=\"nofollow\">Twitter for Android</a>", "Truncated": true, "InReplyToStatusId": 1014745992007122944, "InReplyToUserId": 58825393, "InReplyToScreenName": "cyberomin", "GeoLocation": null, "Place": null, "Favorited": false, "Retweeted": false, "FavoriteCount": 0, "User": {"Id": 886892915196387328, "Name": "JJ Sankara", "ScreenName": "uberJJ", "Location": "Nigeria", "Description": "Raconteur.", "ContributorsEnabled": false, "ProfileImageURL": "", "BiggerProfileImageURL": "", "MiniProfileImageURL": "", "OriginalProfileImageURL": "", "ProfileImageURLHttps": "", "BiggerProfileImageURLHttps": "", "MiniProfileImageURLHttps": "", "OriginalProfileImageURLHttps": "", "DefaultProfileImage": false, "URL": "", "Protected": false, "FollowersCount": 542, "ProfileBackgroundColor": "000000", "ProfileTextColor": "000000", "ProfileLinkColor": "000000", "ProfileSidebarFillColor": "000000", "ProfileSidebarBorderColor": "000000", "ProfileUseBackgroundImage": false, "DefaultProfile": false, "ShowAllInlineMedia": false, "FriendsCount": 272, "CreatedAt": 1500286723000, "FavouritesCount": 578, "UtcOffset": -1, "TimeZone": null, "ProfileBackgroundImageURL": "", "ProfileBackgroundImageUrlHttps": "", "ProfileBannerURL": "", "ProfileBannerRetinaURL": "", "ProfileBannerIPadURL": "", "ProfileBannerIPadRetinaURL": "", "ProfileBannerMobileURL": "", "ProfileBannerMobileRetinaURL": "", "ProfileBackgroundTiled": false, "Lang": "en", "StatusesCount": 3912, "GeoEnabled": false, "Verified": false, "Translator": false, "ListedCount": 2, "FollowRequestSent": false, "WithheldInCountries": []}, "Retweet": false, "Contributors": [], "RetweetCount": 0, "RetweetedByMe": false, "CurrentUserRetweetId": -1, "PossiblySensitive": false, "Lang": "en", "WithheldInCountries": [], "HashtagEntities": [], "UserMentionEntities": [{"Name": "Celestine Omin", "Id": 58825393, "Text": "cyberomin", "ScreenName": "cyberomin", "Start": 0, "End": 10}], "MediaEntities": [], "SymbolEntities": [], "URLEntities": []}

Value Schema:

rmoff commented 6 years ago

I've tried this too with a STRING key ("key.converter": ""), and get the same error from KSQL

  "name": "twitter_source_avro_02",
  "config": {
    "key.converter": "",
        "value.converter": "io.confluent.connect.avro.AvroConverter",
        "value.converter.schema.registry.url": "http://localhost:8081",
    "kafka.delete.topic": "twitter_deletes_avro_02",
    "twitter.oauth.consumerKey": "XXXX",
    "twitter.oauth.consumerSecret": "XXXX",
    "twitter.oauth.accessToken": "XXXX",
    "twitter.oauth.accessTokenSecret": "XXXX",
    "kafka.status.topic": "twitter_avro_02",
    "connector.class": "com.github.jcustenborder.kafka.connect.twitter.TwitterSourceConnector",
    "process.deletes": true,
    "filter.keywords": "rickastley,rmoff,ksql,confluent,jaykreps,gwenshap,apachekafka,nehanarkhede,kafka streams,kafka connect,kafkasummit,kafka,bacon,aws,ilkley"

Sample message (note STRING key):

ksql> print 'twitter_avro_02' from beginning;
06/07/18 10:19:46 BST, Struct{Id=1015163384315240448}, {"CreatedAt": 1530868786000, "Id": 1015163384315240448, "Text": "Will be doing lamb burgers with feta and tzaziki\nCrumbed chicken burgers with Asian pickles \nAnd will revive the beef burger with the bacon, blue cheese, bourbon poached pears", "Source": "<a href=\"\" rel=\"nofollow\">Twitter for Android</a>", "Truncated": true, "InReplyToStatusId": 1015163372000759808, "InReplyToUserId": 2586931947, "InReplyToScreenName": "cheftakura", "GeoLocation": null, "Place": null, "Favorited": false, "Retweeted": false, "FavoriteCount": 0, "User": {"Id": 2586931947, "Name": "Hotelier ��", "ScreenName": "cheftakura", "Location": "Bvumba and Mutare ", "Description": "Wine, food, cricket, economics, music. Probably in that order", "ContributorsEnabled": false, "ProfileImageURL": "", "BiggerProfileImageURL": "", "MiniProfileImageURL": "", "OriginalProfileImageURL": "", "ProfileImageURLHttps": "", "BiggerProfileImageURLHttps": "", "MiniProfileImageURLHttps": "", "OriginalProfileImageURLHttps": "", "DefaultProfileImage": false, "URL": null, "Protected": false, "FollowersCount": 1990, "ProfileBackgroundColor": "8B542B", "ProfileTextColor": "333333", "ProfileLinkColor": "9D582E", "ProfileSidebarFillColor": "EADEAA", "ProfileSidebarBorderColor": "D9B17E", "ProfileUseBackgroundImage": true, "DefaultProfile": false, "ShowAllInlineMedia": false, "FriendsCount": 1503, "CreatedAt": 1402058406000, "FavouritesCount": 57899, "UtcOffset": -1, "TimeZone": null, "ProfileBackgroundImageURL": "", "ProfileBackgroundImageUrlHttps": "", "ProfileBannerURL": "", "ProfileBannerRetinaURL": "", "ProfileBannerIPadURL": "", "ProfileBannerIPadRetinaURL": "", "ProfileBannerMobileURL": "", "ProfileBannerMobileRetinaURL": "", "ProfileBackgroundTiled": false, "Lang": "en", "StatusesCount": 13593, "GeoEnabled": false, "Verified": false, "Translator": false, "ListedCount": 18, "FollowRequestSent": false, "WithheldInCountries": []}, "Retweet": false, "Contributors": [], "RetweetCount": 0, "RetweetedByMe": false, "CurrentUserRetweetId": -1, "PossiblySensitive": false, "Lang": "en", "WithheldInCountries": [], "HashtagEntities": [], "UserMentionEntities": [], "MediaEntities": [], "SymbolEntities": [], "URLEntities": []}

STREAM still fails to create:

ksql> create stream twitter with (kafka_topic='twitter_avro_02',value_format='avro');
 Unable to verify the AVRO schema is compatible with KSQL. Map key must be of type STRING
rmoff commented 6 years ago

Internal note from @rodesai:

this is complaining because the schema has a map with non-string keys currently KSQL only handles strings as map keys once the schema inference test PR gets merged this should start working (edited)

rmoff commented 6 years ago

@rodesai when's the schema inference test PR due?

To clarify, I'm trying to get this blog—which currently uses JSON only and VARCHAR/EXTRACTJSONFIELD to navigate the schema—to use Avro natively instead.

rmoff commented 6 years ago

New error, with latest build (rc4):

ksql> create stream tweets with (kafka_topic='twitter_avro',value_format='avro');

Sample message:

{"CreatedAt": 1532985026000, "Id": 1024039540556742656, "Text": "RT @awscloud: Have you scheduled your AWS Certified Cloud Practitioner exam? Validate your skills in cloud fundamentals with an industry-re\u2026", "Source": "<a href=\"\" rel=\"nofollow\">GaggleAMP</a>", "Truncated": false, "InReplyToStatusId": -1, "InReplyToUserId": -1, "InReplyToScreenName": null, "GeoLocation": null, "Place": null, "Favorited": false, "Retweeted": false, "FavoriteCount": 0, "User": {"Id": 3069390827, "Name": "Stefan Letz", "ScreenName": "stefletz", "Location": "Seattle, WA", "Description": "I work for @AWSCloud and my opinions are my own.", "ContributorsEnabled": false, "ProfileImageURL": "", "BiggerProfileImageURL": "", "MiniProfileImageURL": "", "OriginalProfileImageURL": "", "ProfileImageURLHttps": "", "BiggerProfileImageURLHttps": "", "MiniProfileImageURLHttps": "", "OriginalProfileImageURLHttps": "", "DefaultProfileImage": false, "URL": null, "Protected": false, "FollowersCount": 32, "ProfileBackgroundColor": "C0DEED", "ProfileTextColor": "333333", "ProfileLinkColor": "1DA1F2", "ProfileSidebarFillColor": "DDEEF6", "ProfileSidebarBorderColor": "C0DEED", "ProfileUseBackgroundImage": true, "DefaultProfile": true, "ShowAllInlineMedia": false, "FriendsCount": 61, "CreatedAt": 1425453770000, "FavouritesCount": 624, "UtcOffset": -1, "TimeZone": null, "ProfileBackgroundImageURL": "", "ProfileBackgroundImageUrlHttps": "", "ProfileBannerURL": null, "ProfileBannerRetinaURL": null, "ProfileBannerIPadURL": null, "ProfileBannerIPadRetinaURL": null, "ProfileBannerMobileURL": null, "ProfileBannerMobileRetinaURL": null, "ProfileBackgroundTiled": false, "Lang": "en", "StatusesCount": 986, "GeoEnabled": false, "Verified": false, "Translator": false, "ListedCount": 4, "FollowRequestSent": false, "WithheldInCountries": []}, "Retweet": true, "Contributors": [], "RetweetCount": 0, "RetweetedByMe": false, "CurrentUserRetweetId": -1, "PossiblySensitive": false, "Lang": "en", "WithheldInCountries": [], "HashtagEntities": [], "UserMentionEntities": [{"Name": "Amazon Web Services", "Id": 66780587, "Text": "awscloud", "ScreenName": "awscloud", "Start": 3, "End": 12}], "MediaEntities": [], "SymbolEntities": [], "URLEntities": []}


rmoff commented 6 years ago

Is it the END column tripping things up here?

rodesai commented 6 years ago

Yeah it looks like the parser doesn't like it when a column has that name:

ksql> create stream foo (end bigint) with (kafka_topic='users', value_format='json');
allenansari174 commented 5 years ago

I was trying to do step by step what you did in the blog, but when trying to read data from KSQL all fields are null(no name, no id, no text) any idea what I need to do. @rmoff thanks