confluentinc / kafka-connect-hdfs

Kafka Connect HDFS connector
Other
9 stars 397 forks source link

Avro Schema w/Hive integration #145

Open rnpridgeon opened 7 years ago

rnpridgeon commented 7 years ago

Due to a known limitation with hive schema literals can only be saved to the serde properties field if they are shorter than 4000 characters. The biggest issue with this limitation is that the table creation will not fail. Instead the Schema is truncated and the rest of the operation succeeds as normal.

https://issues.apache.org/jira/browse/HIVE-9815 https://issues.apache.org/jira/browse/HIVE-12274 https://issues.apache.org/jira/browse/HIVE-12299

The work around for this is to store your schema definition in a separate file and setting the appropriate table property. Alternatively you could redefine the datatype within the hive schema but this seems a bit like overkill.

Given that this is a known issue and Avro schemas are quite often in excess of 4000 characters the SR should handle this more gracefully. When Hive integration is enabled the schema should be written to a separate file and the appropriate table property should be set.

Thanks, Ryan

Vincent-Zeng commented 4 years ago

Hi. With hive.integration=true, how cankafka-connect-sink use avro.schema.url instead of avro.schema.literal. Or I need alter table manually in Hive?