awslabs / amazon-kinesis-producer

Amazon Kinesis Producer Library
Apache License 2.0
398 stars 331 forks source link

Mixed message payloads with multiple instances of KPL #80

Open gcobr opened 7 years ago

gcobr commented 7 years ago

I am seeing the content of some messages being concatenated and delivered to the listeners of a stream as one message instead of two.

I suspect that the fact that my application is creating more than one producer per JVM instance could be causing the problem.

Could there be an unsynchronized race condition between the Java and the native binary codes? Such as two producers reading and writing to the same channel of communication with the native executables?

FastNinja commented 7 years ago

issue appear to be due to the aggregation that KPL perform https://github.com/awslabs/kinesis-aggregation

data is encoded using Google Protocol Buffers, and returned to the calling function for subsequent use. You can then publish to Kinesis and the data is compatible with consumers using the KCL or these Deaggregation modules.

so Lambda consumers do not deaggregate events by default. additional logic is required or KPL needs to switch aggregation OFF.

pfifer commented 7 years ago

Creating more than one KinesisProducer instance will actually create additional native processes. Each of these processes are independent, and don't intermix data.

I'm not sure what you mean by two messages being delivered as one message. Do you have an example?

ptc-dcohen commented 7 years ago

I'm also seeing similar issue. But in our case we are handling only one instance of KinesisProducer. When I handled new instance of Kinesis Producer per each request, the KPL didn't aggreate the records and the consumer got one record each time - which was correct behavior. After chaning the proudcer to be singleton, started to saw the following: The consumer is getting a list of records and each record contains a concatenation of multiple records separated by: "^Z�^H^H^@^Z�^H". The consumer expecting to get list of records and each record should have only his data.

Is it a correct behavior of the KPL ? Is the KPL is using the Kinesis put records or put record API ?
Can you help resolving this issue ?

Thanks, Daniel

FastNinja commented 7 years ago

@ptc-dcohen Have you had a look at the link I provided?

https://github.com/awslabs/kinesis-aggregation

The Kinesis Producer will squeeze multiple events into single using ProtoBuf algorithm unless you specifically configure it to false.

Set the property aggregation to off

Enable aggregation. With aggregation, multiple user records are packed into a single KinesisRecord. If disabled, each user record is sent in its own KinesisRecord.

If your records are small, enabling aggregation will allow you to put many more records than you would otherwise be able to for a shard before getting throttled. Default: true AggregationEnabled = true

from https://github.com/awslabs/amazon-kinesis-producer/blob/master/java/amazon-kinesis-producer-sample/default_config.properties

Another option will be to desegregate on a consumer side.

ptc-dcohen commented 7 years ago

@FastNinja Thanks, got it. Is there a documentation that explains each property ? it will be very helpful.

FastNinja commented 7 years ago

I think only that: http://docs.aws.amazon.com/streams/latest/dev/kinesis-kpl-config.html which leads to that: https://github.com/awslabs/amazon-kinesis-producer/blob/master/java/amazon-kinesis-producer-sample/default_config.properties

Remember - code is your best documentation. It never lies!

pfifer commented 7 years ago

Are you using the Amazon Kinesis Client to process records, or using something else.

The Amazon Kinesis Client has specialized code to handle records coming from the KPL. If you're doing your own processing you should implement the same behavior.