confluentinc / kafka-connect-storage-common

Shared software among connectors that target distributed filesystems and cloud storage.
Other
5 stars 155 forks source link

How could I persist Object Meta Data along with Object into S3 using the Connector #109

Open FelixKJose opened 5 years ago

FelixKJose commented 5 years ago

I have requirement that I have to persist the object metadata along with object. So later we could use that in Amazon Athena to do some queries and also avoid applications to pull only meta data instead of entire object. Is there any support in the connector to do persist the meta data (which AWS S3 SDK supports)? I have seen great provisions to dynamically create S3 object Key by deriving from Object fields etc, but couldn't find a way to derive the meta data and persist that along with Object.

abhisheksahani commented 5 years ago

Hi Felix, You can store the object in s3 as avro format, later you can extract the schema from avro object stored in s3 bucket and create table in athena using the schema you onbtained to query over data. regards, Abhishek Sahani

On Fri, Aug 2, 2019 at 8:27 PM FelixKJose notifications@github.com wrote:

I have requirement that I have to persist the object metadata along with object. So later we could use that in Amazon Athena to do some queries and also avoid applications to pull only meta data instead of entire object. Is there any support in the connector to do persist the meta data (which AWS S3 SDK supports)? I have seen great provisions to dynamically create S3 object Key by deriving from Object fields etc, but couldn't find a way to derive the meta data and persist that along with Object.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/confluentinc/kafka-connect-storage-common/issues/109?email_source=notifications&email_token=AGEZV6RC44TDOBFKUAYXEP3QCRDONA5CNFSM4II6RF2KYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HDCSOBQ, or mute the thread https://github.com/notifications/unsubscribe-auth/AGEZV6TNU7QSQDDHVAUCFLLQCRDONANCNFSM4II6RF2A .

FelixKJose commented 5 years ago

Thank you Abhishek. But if we have a Web Application, that just needs the metaData instead of entire object, then that is not possible if I don't persist the metadata[ eg: created user, created date, company id etc] AWS SDK supports PutObjectRequest putObjectRequest = new PutObjectRequest(container, key, new ByteArrayInputStream(payload), objectMetaData); amazonS3.putObject(putObjectRequest);

The provision s3 gives to retrieve just meta data using AmazonS3.getObjectMetadata(bucket, key).

FelixKJose commented 5 years ago

Can someone please give me an answer for this?

OneCricketeer commented 5 years ago

If your question is about the S3 connector, that repo is here - https://github.com/confluentinc/kafka-connect-storage-cloud

It's not clear what metadata you would expect a Kafka connector to add other than what it generically knows about (topic name, partition, and offset)

Seems the only metadata that is added, though, is the SSE Algorithm -https://github.com/confluentinc/kafka-connect-storage-cloud/blob/master/kafka-connect-s3/src/main/java/io/confluent/connect/s3/storage/S3OutputStream.java#L180-L193

FelixKJose commented 5 years ago

Yes, I was asking whether I could put some more custom meta information along with SSEAlgorithm. For example: appId, user name etc. Could kafka publisher publish some meta data along with the message and that meta data can be stored along with the S3 object.

Object MetaData reference from AWS S3: https://docs.aws.amazon.com/AmazonS3/latest/user-guide/add-object-metadata.html in that I am talking about User-defined metadata

OneCricketeer commented 5 years ago

Sure, it could, but currently does not allow that to be configurable, and that should be an issue for a differernt repo. https://github.com/confluentinc/kafka-connect-storage-cloud