Open rkravinderkumar05 opened 2 years ago
If I just change my kafka connect config from io.confluent.connect.s3.format.parquet.ParquetFormat
to io.confluent.connect.s3.format.json.JsonFormat
, the same thing works like charm, but for my use case, I need parquet.
@rkravinderkumar05 did you find any solution to this problem?
We are currently facing the same issue,
We noticed that all the arrays are labeled with array
from the parquet schema generated.
Using the parq :
(we just tested with a single array inside the schema, the name should be cpu_cpus
)
Someone can help us to solve this issue?
We are not able to understand where the array
name is inserted (our idea is to use the field list name with array as prefix e.g. cpu_cpus_array
).
Thanks!
Any news?
I was working on it in for my previous employer, I don't really remember if I was able to fix it. But you could use JSON storage or start using schema registry as well, it makes life easy.
Thanks @rkravinderkumar05 for your response, we need to store data al Parquet files, so we need schema and we cannot use JSON instead of JSONSchema.
If you have some old code (or just some old notes) me and my team we can try to fix and to share with the community.
Let me know! Thanks
I don't have the schema registry, so instead I send schema with the kafka event, to use field partitioner. If I use the Json writer, everything works fine. Even if I use the parquet write, it works fine, if I have just 1 array of objects field. When I have 2 arrays, and use Parquet writer, I get this error:
Kafka event :
KAFKA S3 connector config