Open FelixKJose opened 5 years ago
Hi Felix, You can store the object in s3 as avro format, later you can extract the schema from avro object stored in s3 bucket and create table in athena using the schema you onbtained to query over data. regards, Abhishek Sahani
On Fri, Aug 2, 2019 at 8:27 PM FelixKJose notifications@github.com wrote:
I have requirement that I have to persist the object metadata along with object. So later we could use that in Amazon Athena to do some queries and also avoid applications to pull only meta data instead of entire object. Is there any support in the connector to do persist the meta data (which AWS S3 SDK supports)? I have seen great provisions to dynamically create S3 object Key by deriving from Object fields etc, but couldn't find a way to derive the meta data and persist that along with Object.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/confluentinc/kafka-connect-storage-common/issues/109?email_source=notifications&email_token=AGEZV6RC44TDOBFKUAYXEP3QCRDONA5CNFSM4II6RF2KYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HDCSOBQ, or mute the thread https://github.com/notifications/unsubscribe-auth/AGEZV6TNU7QSQDDHVAUCFLLQCRDONANCNFSM4II6RF2A .
Thank you Abhishek. But if we have a Web Application, that just needs the metaData instead of entire object, then that is not possible if I don't persist the metadata[ eg: created user, created date, company id etc] AWS SDK supports PutObjectRequest putObjectRequest = new PutObjectRequest(container, key, new ByteArrayInputStream(payload), objectMetaData); amazonS3.putObject(putObjectRequest);
The provision s3 gives to retrieve just meta data using AmazonS3.getObjectMetadata(bucket, key).
Can someone please give me an answer for this?
If your question is about the S3 connector, that repo is here - https://github.com/confluentinc/kafka-connect-storage-cloud
It's not clear what metadata you would expect a Kafka connector to add other than what it generically knows about (topic name, partition, and offset)
Seems the only metadata that is added, though, is the SSE Algorithm -https://github.com/confluentinc/kafka-connect-storage-cloud/blob/master/kafka-connect-s3/src/main/java/io/confluent/connect/s3/storage/S3OutputStream.java#L180-L193
Yes, I was asking whether I could put some more custom meta information along with SSEAlgorithm. For example: appId, user name etc. Could kafka publisher publish some meta data along with the message and that meta data can be stored along with the S3 object.
Object MetaData reference from AWS S3: https://docs.aws.amazon.com/AmazonS3/latest/user-guide/add-object-metadata.html in that I am talking about User-defined metadata
Sure, it could, but currently does not allow that to be configurable, and that should be an issue for a differernt repo. https://github.com/confluentinc/kafka-connect-storage-cloud
I have requirement that I have to persist the object metadata along with object. So later we could use that in Amazon Athena to do some queries and also avoid applications to pull only meta data instead of entire object. Is there any support in the connector to do persist the meta data (which AWS S3 SDK supports)? I have seen great provisions to dynamically create S3 object Key by deriving from Object fields etc, but couldn't find a way to derive the meta data and persist that along with Object.