NerdWalletOSS / kinesis-python

Low level, multiprocessing based AWS Kinesis producer & consumer library
Other
118 stars 50 forks source link

Producer loop ignores max # of messages per put_records #3

Open itamarla opened 7 years ago

itamarla commented 7 years ago

Looking at the producer's loop, it looks like there is no limit to the number of messages (only size is referenced) per flush. In case of multiple small messages one might surpass the 500 msgs count and the put_messages would fail.

Have I missed something?

Thanks, itamar

borgstrom commented 7 years ago

Hi @itamarla 👋

Thanks for the issue.

I think you're right in that the current implementation is naive and doesn't really match the AWS limits:

Each shard can support up to 1,000 records per second for writes, up to a maximum total data write rate of 1 MB per second (including partition keys). This write limit applies to operations such as PutRecord and PutRecords. -- http://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html

Our producer is simply enforcing a limit of 1Mb per buffer time cycle (1 sec by default), which if you have more than 1 shard isn't actually correct.

To further complicate things, if the buffer time is changed then we need to calculate against it. And finally, we might not be the only producer, in which case we're likely to be throttled if we do try to use the full limit.

For now I'm going to add a simple check that ensures we don't add more than 1000 messages to a put operation and will spend some time thinking about a more robust solution for the long term.