Streaming binary data to S3

dizcza commented 1 year ago

I've found neither forum nor Github discussion page of yours so I'm asking it here.

I'd like to upload custom data (1024 Hz sampling rate, 10 bytes each) to an S3 bucket. For the purpose of the question, think of it as an audio stream.

One thing that came to my mind is websockets, but it seems you don't support them as there is a number of related issues that have not been addressed.

Natively, the reasoning boiled down to HTTP S3 upload demo. But it sends the data in a non-secure fashion without authentication.

Could someone elaborate on how to for the following two alternatives:

Connect a device to the AWS IoT Core and use the secure connection to get an authentication token (JWT) that can be passed to API Gateway to secure the endpoint. Then use your S3 upload demo to upload a binary audio stream piece by piece.
Connect a device to the AWS IoT Core and set up a rule to retransmit/republish these base64 payloads to S3. Then send base64-encoded binary data piece by piece. It comes at a cost of ~30% extra payload length (base64) and extra decoding work on the backend, but brings benefits of integration with Lambdas, SageMaker studio, visualization, etc.

or maybe something else? There are plenty of AWS services with an exponentially-many number of ways of their interactions. I cannot pick the best case to suit this scenario.

For a start, I just want to store the binary data from many devices (one for a start) in a database or S3. Later on, I'd like to visualize the stream of incoming data (a real-time graph) on AWS IoT page. Eventually, I'll apply machine learning stuff on top of that (but that's way too far from the current scope).

Either way, I'll be using ATECC608A module to secure the communication. The data will be sent from ESP32 boards if this matters.

A minor offset question: I cannot append incoming data in a file located on an S3 bucket, can I?

paulbartell commented 1 year ago

Connect a device to the AWS IoT Core and set up a rule to retransmit/republish these base64 payloads to S3. Then send base64-encoded binary data piece by piece. It comes at a cost of ~30% extra payload length (base64) and extra decoding work on the backend, but brings benefits of integration with Lambdas, SageMaker studio, visualization, etc.

@dizcza : Sending the data via MQTT may be the best choice for your application. Batching samples will be particularly important for your use case. In other words, you may want to buffer and send data for 1, 5, or 10 seconds at a time. Keep in mind that the maximum payload size for an MQTT publish is 128KB on AWS IoT Core and messages are metered in 5KB increments.

The examples in this repository are not optimized for high throughput, so you may need to make some optimizations to the way responses from the server are received to reach your desired throughput.

You might consider Kinesis Data Streams and/or Kinesis Data Firehose and their associated AWS IoT Core rules which allow you to access these APIs via MQTT in a more optimal way without having to base64 encode all of your data.

A time-series database like AWS timestream may also be an option, although it may not be well-suited to high sampling rates.

Regarding the esp32 platform, consider using the esp-aws-iot collection from espressif which includes some of the same libraries included in the repository optimized for the esp-idf platform.

To address the questions from your post directly:

Connect a device to the AWS IoT Core and use the secure connection to get an authentication token (JWT) that can be passed to API Gateway to secure the endpoint. Then use your S3 upload demo to upload a binary audio stream piece by piece.

The best reference for this is the HTTP S3 Download Demo which demonstrates using the AWS SigV4 library and AWS IoT Credential Provider MQTT api. You will need to modify it as you see fit to use Upload API calls rather than Download API calls. A similar demo is available in the FreeRTOS repository as well.

A minor offset question: I cannot append incoming data in a file located on an S3 bucket, can I?

AWS S3 supports multi-part uploads. While this isn't appending to an existing file, it may fit your use case. Please take a look at the CreateMultipartUpload, UploadPart, and CompleteMultipartUpload API calls for more info.

dizcza commented 1 year ago

Thanks Paul for the detailed answer. You've provided invaluable information in one shot. It's interesting that this is the first time I'm encountering the terminology you used (like Kinesis DB, AWS timestream, multi-upload, etc.) in connection with AWS IoT core, and I've found no similar mentions whatsoever to any of these ways on Internet googling specifically for secure transfer streaming data to AWS IoT. I'm wondering, hasn't streaming audio and video to AWS IoT reached enough popularity?

esp-aws-iot - that's the repo I started with. In fact, I've asked similar questions here and here but received no response since then. It looks like they have abandoned the use cases I'm looking for and their release/beta branch with S3 uploads has not found its way to be merged into master. Such a pity.

I'll come back once I study these materials.

paulbartell commented 1 year ago

@dizcza Happy to point you in the right direction.

For future reference, AWS RePost is a great resource for asking general AWS questions and is monitored by Solution Architects who have a wider knowledge base. We're happy to support you in whichever place works best for you.

DarshakDev commented 1 year ago

@dizcza : I have come across similar problem a year ago and resolved it with your approach number 1. Hope you have resolved this, this is just in case if you need any help.

dizcza commented 1 year ago

@DarshakDev thanks for the note. I haven't resolved it yet - I dwelled into documentation...

An example of approach 1 would really help - I've found none, only descriptions of how to do it. If you have your project publically available, it'd really help to have a look at how it's done.

paulbartell commented 1 year ago

Closing this issue due to inactivity. Feel free to reopen if anything changes.

aws / aws-iot-device-sdk-embedded-C

Streaming binary data to S3 #1841