aws-amplify / aws-sdk-android

AWS SDK for Android. For more information, see our web site:
https://docs.amplify.aws
Other
1.02k stars 548 forks source link

kinesis sdk to upload batches concurrently #3467

Closed dss99911 closed 7 months ago

dss99911 commented 7 months ago

State your question My Android application uploads records more than 3000 at one time and some users contain 80000 records. and the business requirement is to upload these records as soon as possible. but, when I check the kinesis recorder code

it seems to upload all the batch(128 records) sequentially. for uploading 3000 ~ 80000 records, it takes 23 ~ 625 batch uploading sequentially. so, I would like to ask if it's fine to upload the batches concurrently by thread or coroutine by customizing the code somehow by myself. is it the proper approach? or I'll appreciate any suggestions. thanks

Which AWS Services are you utilizing? aws-android-sdk-kinesis

Provide code snippets (if applicable)

Environment(please complete the following information):

Device Information (please complete the following information):

If you need help with understanding how to implement something in particular then we suggest that you first look into our developer guide. You can also simplify your process of creating an application, as well as the associated backend setup by using the Amplify CLI.

tylerjroach commented 7 months ago

It looks like submitAllRecords is public and does not appear to use any private methods from what I can see. It looks like it may be relatively easy to override submitAllRecords in your code and modify the behavior to your use case.

From the documentation of PutRecords, it appears a reasonable first step may be to increase the batch size.

Each PutRecords request can support up to 500 records. Each record in the request can be as large as 1 MiB, up to a limit of 5 MiB for the entire request, including partition keys. Each shard can support writes up to 1,000 records per second, up to a maximum data write total of 1 MiB per second.

My concern with attempting to upload concurrent batches is that the ordering sequence may be lost.

dss99911 commented 7 months ago

Thanks for the answer. As each shard can support writes up to 1,000 records per second, we are considering converting multiple records to one for the first step. after that, if it still needs faster uploading, I'll consider uploading concurrently as the data ordering is not important for the data