Open ideasculptor opened 3 years ago
having investigated transformations, I find myself wondering why the kafkaData thing is even included in the connector since it can be handled much more flexibly via transformations. Easier fix is probably to remove it and update documentation to explain how to do the same thing via transformation. Same goes for setting up message timestamp based partitioning, since the message timestamp must be included in the value via a transformation in order to use it. That one really needs some documentation.
Even though
includeKafkaData
is false, it still attempted to create a table with the kafkadata field
This is not a supported property of the connector. I'll try to update the docs to remove it as it appears nowhere in the code base (at least, not that I've been able to find).
Basic sanity checking of inputs doesn't seem unreasonable
Agree that preflight validation to ensure that the property isn't given an empty value or one that can't be used for a BigQuery field name (when sanitization is disabled) would be useful; however...
especially since control center will not allow you to unset a value of a config if it was included in an uploaded config, even if the value was an empty string.
... this is an issue of the UI you're working with, not of the connector.
I find myself wondering why the kafkaData thing is even included in the connector since it can be handled much more flexibly via transformations.
That's a fair point. We can probably do this for the 3.0 release and make life easier for users and maintainers of the connector alike. The one thing that does give me pause is that wallclock time is included in the Kafka data field, which may be a little trickier to include since AFAIK Connect doesn't come with an SMT that does that out of the box, but I don't know how useful that would really be and it'd probably be worth the tradeoff to remove it along with the rest of the kafkaDataFieldName
functionality.
Same goes for setting up message timestamp based partitioning, since the message timestamp must be included in the value via a transformation in order to use it.
This is actually incorrect; message timestamp partitioning works fine without being included in the record value as long as these conditions are met:
DAY
bigQueryMessageTimePartitioning
set to true
and bigQueryPartitionDecorator
set to true
I believe this is a valid feature of the connector as it allows users to target partitions based on Kafka record timestamps without having to create column-partitioned tables (which may be difficult if multiple sources are writing to the table and ingestion time-partitioning is desirable for most of them).
If anyone else is wondering about this:
The includeKafkaData
config option was removed entirely some time ago: https://github.com/confluentinc/kafka-connect-bigquery/commit/5dbec3327deeb305d50185fc1896b91a9f694b2a
Setting kafkaDataFieldName
appears to be the replacement.
The Confluent documentation at https://docs.confluent.io/kafka-connectors/bigquery/current/kafka_connect_bigquery_config.html is definitely not up-to-date.
I find myself wondering why the kafkaData thing is even included in the connector since it can be handled much more flexibly via transformations.
also it would seem that the standard transformation to look for is InsertField
: https://docs.confluent.io/platform/current/connect/transforms/insertfield.html
which could be used instead of includeKafkaData
I sent the following config:
Even though
includeKafkaData
is false, it still attempted to create a table with the kafkadata field, setting the name to an empty string. Basic sanity checking of inputs doesn't seem unreasonable, especially since control center will not allow you to unset a value of a config if it was included in an uploaded config, even if the value was an empty string. It passes the empty string along no matter what you do in the UI. So maybe it should recognize that an empty string is the same thing as null?