Fokko / divolte-kafka-druid-superset

A proof of concept using Divolte, Kafka, Druid and Superset
61 stars 45 forks source link

Parse Divolte Avro #10

Closed MrMoronIV closed 5 years ago

MrMoronIV commented 5 years ago

I'm partially following your blogpost and am now trying to parse a kafka divolte stream but it complains that it cannot use the Avro parser. I'm trying to setup a production environment for the stack you described but not with docker images.

My setup, all on separate instances. Divolte Collectors -> Kafka Cluster -> Druid

I can read Kafka message in Druid when I add a datasource (consumer works), but I cannot seem to find a way to parse them. How and where would I define the Avro schema for the divolte messages (it's the default one as of now),. I tried your spec file but I get:

Error: HTML Error: java.lang.IllegalArgumentException: Could not resolve type id 'avro' into a subtype of [simple type, class org.apache.druid.data.input.impl.ParseSpec]: known type ids = [ParseSpec, csv, javascript, json, jsonLowercase, regex, timeAndDims, tsv] at [Source: N/A; line: -1, column: -1] (through reference chain: org.apache.druid.data.input.impl.StringInputRowParser["parseSpec"])

I can't seems to find any documentation anywhere about this step aside from your blog post. Am I missing something? Divolte is on the default setup, does it need changes as well maybe? How do I get Divolte to play nicely with Druid?

MrMoronIV commented 5 years ago

It seems the avro extension had to be loaded in the config.