Open gcobr opened 8 years ago
issue appear to be due to the aggregation that KPL perform https://github.com/awslabs/kinesis-aggregation
data is encoded using Google Protocol Buffers, and returned to the calling function for subsequent use. You can then publish to Kinesis and the data is compatible with consumers using the KCL or these Deaggregation modules.
so Lambda consumers do not deaggregate events by default. additional logic is required or KPL needs to switch aggregation OFF.
Creating more than one KinesisProducer instance will actually create additional native processes. Each of these processes are independent, and don't intermix data.
I'm not sure what you mean by two messages being delivered as one message. Do you have an example?
I'm also seeing similar issue. But in our case we are handling only one instance of KinesisProducer. When I handled new instance of Kinesis Producer per each request, the KPL didn't aggreate the records and the consumer got one record each time - which was correct behavior. After chaning the proudcer to be singleton, started to saw the following: The consumer is getting a list of records and each record contains a concatenation of multiple records separated by: "^Z�^H^H^@^Z�^H". The consumer expecting to get list of records and each record should have only his data.
Is it a correct behavior of the KPL ?
Is the KPL is using the Kinesis put records or put record API ?
Can you help resolving this issue ?
Thanks, Daniel
@ptc-dcohen Have you had a look at the link I provided?
https://github.com/awslabs/kinesis-aggregation
The Kinesis Producer will squeeze multiple events into single using ProtoBuf algorithm unless you specifically configure it to false.
Set the property aggregation
to off
Enable aggregation. With aggregation, multiple user records are packed into a single KinesisRecord. If disabled, each user record is sent in its own KinesisRecord.
If your records are small, enabling aggregation will allow you to put many more records than you would otherwise be able to for a shard before getting throttled. Default: true AggregationEnabled = true
Another option will be to desegregate on a consumer side.
@FastNinja Thanks, got it. Is there a documentation that explains each property ? it will be very helpful.
I think only that: http://docs.aws.amazon.com/streams/latest/dev/kinesis-kpl-config.html which leads to that: https://github.com/awslabs/amazon-kinesis-producer/blob/master/java/amazon-kinesis-producer-sample/default_config.properties
Remember - code is your best documentation. It never lies!
Are you using the Amazon Kinesis Client to process records, or using something else.
The Amazon Kinesis Client has specialized code to handle records coming from the KPL. If you're doing your own processing you should implement the same behavior.
I am seeing the content of some messages being concatenated and delivered to the listeners of a stream as one message instead of two.
I suspect that the fact that my application is creating more than one producer per JVM instance could be causing the problem.
Could there be an unsynchronized race condition between the Java and the native binary codes? Such as two producers reading and writing to the same channel of communication with the native executables?