confluentinc / kafka-connect-storage-common

Shared software among connectors that target distributed filesystems and cloud storage.
Other
5 stars 155 forks source link

FieldPartition with lowercase field names #78

Open rajilion opened 6 years ago

rajilion commented 6 years ago

Hi.

i am using HDFS sink connector to put my data into hadoop from kafka. and i am using Fieldpartitioner with a field name. kafka connect creates field name in uppercase since the field name from the source is in uppercase. Problem i have is when i create hive table over the data, hive couldn't recognize the partitions with uppercase since all the fields in hive is converted to lowercase when stored in meta-store. And i don't want to use hive sink since streaming properties are not enabled in hive-site.xml and i dont want my table to be in ORC format. Is there anyways i could change my field names to lowercase/ is there any possible ways to create a custom field partitioner?

OneCricketeer commented 6 years ago

And i don't want to use hive sink since streaming properties are not enabled in hive-site.xml and i dont want my table to be in ORC format

Tables won't automatically be ORC (it's not even a supported format for this project yet), and streaming doesn't need to be enabled for Kafka Connect to work.

Regarding lowercasing, a SMT can be attempted

rajilion commented 6 years ago

i did this is in a different way by using query option in jdbcsource connector with having select query and columns of partitions in lowercase.. and for streaming as per hive, the table has to be in orc

https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest

OneCricketeer commented 6 years ago

for streaming as per hive, the table has to be in orc

Correct, but the only formats of HDFS Connect are currently avro, json, parquet and string

https://github.com/confluentinc/kafka-connect-hdfs/tree/master/src/main/java/io/confluent/connect/hdfs