In some cases, data is batched on the client side before sending it to Kinesis via a PutRecord request. (For instance, the data blob may be a json array containing objects that should each be handled separately. I'm doing something like this, and looking to store each object on its own line in a newline-delimited json file on S3.) The connector library supports this by including both ITransformer and ICollectionTransformer interfaces and the processRecords method in KinesisConnectorRecordProcessor contains the following:
private void filterAndBufferRecord(T transformedRecord, Record record) {
if (filter.keepRecord(transformedRecord)) {
buffer.consumeRecord(transformedRecord, record.getData().array().length, record.getSequenceNumber());
}
}
Notice that filterAndBufferRecord uses record.getData().array().length as the size of the record. So in the case of transformer being an ICollectionTransformer implementation, each sub-record is passed to buffer.consumeRecord but with the size of the whole batch instead of the individual sub-record. This means the buffer will flush too often because the size it's keeping track of is inflated.
My workaround for now is just setting bufferByteSizeLimit to Integer.MAX_VALUE, and relying on bufferRecordCountLimit to flush the buffer.
In some cases, data is batched on the client side before sending it to Kinesis via a PutRecord request. (For instance, the data blob may be a json array containing objects that should each be handled separately. I'm doing something like this, and looking to store each object on its own line in a newline-delimited json file on S3.) The connector library supports this by including both
ITransformer
andICollectionTransformer
interfaces and theprocessRecords
method inKinesisConnectorRecordProcessor
contains the following:And
filterAndBufferRecord
is implemented as:Notice that
filterAndBufferRecord
usesrecord.getData().array().length
as the size of the record. So in the case oftransformer
being anICollectionTransformer
implementation, each sub-record is passed tobuffer.consumeRecord
but with the size of the whole batch instead of the individual sub-record. This means the buffer will flush too often because the size it's keeping track of is inflated.My workaround for now is just setting
bufferByteSizeLimit
toInteger.MAX_VALUE
, and relying onbufferRecordCountLimit
to flush the buffer.