awslabs / amazon-kinesis-producer

Amazon Kinesis Producer Library
Apache License 2.0
402 stars 331 forks source link

Kinesis Producer Library

Build Status

Introduction

The Amazon Kinesis Producer Library (KPL) performs many tasks common to creating efficient and reliable producers for Amazon Kinesis. By using the KPL, customers do not need to develop the same logic every time they create a new application for data ingestion.

For detailed information and installation instructions, see the article Developing Producer Applications for Amazon Kinesis Using the Amazon Kinesis Producer Library in the Amazon Kinesis Developer Guide.

Back-pressure

Please see this blog post for details about writing efficient and reliable producers using the KPL. This blogpost contains details about overhead in various situations in which you might be using the KPL including back-pressure considerations.

The KPL can consume enough memory to crash itself if it gets pushed too many records without time to process them. As a protection against this, we ask that every customer implement back-pressure to protect the KPL process. Once the KPL starts getting too many records in it's buffer it will spend most of it's CPU cycles on record management, rather than record processing making the problem worse. This is highly dependent on the customer record sizes, rates, configurations, host CPU and memory limits.

When deciding the limits of your KPL instance, please consider your MAX record size, MAX request rate spikes, host memory availability, and TTL. If you are buffering requests before going into the KPL, consider that as well since that still puts memory pressure on the host system. If the KPL buffer grows too large it may be forcibly crashed due to memory exhaustion.

Sample Back-pressure implementation:

ClickEvent event = inputQueue.take();
        String partitionKey = event.getSessionId();
        String payload =  event.getPayload();
        ByteBuffer data = ByteBuffer.wrap(payload.getBytes("UTF-8"));
        while (kpl.getOutstandingRecordsCount() > MAX_RECORDS_IN_FLIGHT) {
            Thread.sleep(SLEEP_BACKOFF_IN_MS);
        }
        recordsPut.getAndIncrement();

        ListenableFuture<UserRecordResult> f =
                kpl.addUserRecord(STREAM_NAME, partitionKey, data);
        Futures.addCallback(f, new FutureCallback<UserRecordResult>() {
          ...
          ...

Sample above is provided as an example implementation. Please take your application and use cases into consideration before applying logic

Recommended Upgrade for All Users of 0.15.0 - 0.15.6 Amazon Kinesis Producer

⚠️ It's highly recommended for users of version 0.15.0 - 0.15.6 of the Amazon Kinesis Producer to upgrade to version 0.15.7 . A bug has been identified in versions prior from 0.15.0 - 0.15.6 is causing memory leak issue.

ℹ️ Amazon Kinesis Producer versions prior to 0.15.0 are not impacted.

Recommended Settings for Streams larger than 800 shards

The KPL is an application for ingesting data to your Kinesis Data Streams. As your streams grow you may find the need to tune the KPL to enable it to accommodate the growing needs of your applications. Without optimized configurations your KPL processes will see inefficient CPU usage and delays in writing records into KDS. For streams larger than 800 shards, we recommend the following settings:

We recommend performing sufficient testing before applying these changes to production, as every customer has different usage patterns

Required KPL Update – v0.15.0

KPL 0.15.0 now incorporates StreamARN in the Kinesis requests, such as PutRecords and ListShards, to take advantage of Kinesis Data Streams (KDS) enhanced availability as the result of service cellularization. Version 0.15.0 adds STS as the new dependency; by using STS, customers can benefit from StreamARN without modifying any code.

Required KPL Update – v0.14.0

KPL 0.14.0 now uses ListShards API, making it easier for your Kinesis Producer applications to scale. Kinesis Data Streams (KDS) enables you to scale your stream capacity without any changes to producers and consumers. After a scaling event, producer applications need to discover the new shard map. Version 0.14.0 replaces the DescribeStream with the ListShards API for shard discovery. ListShards API supports 100TPS per stream compared to DescribeStream that supports 10TPS per account. For an account with 10 streams using KPL v0.14.0 will provide you a 100X higher call rate for shard discovery, eliminating the need for a DescribeStream API limit increase for scaling. You can find more information on the ListShards API in the Kinesis Data Streams documentation.

Required Upgrade

Starting on February 9, 2018 Amazon Kinesis Data Streams will begin transitioning to certificates issued by Amazon Trust Services (ATS). To continue using the Kinesis Producer Library (KPL) you must upgrade the KPL to version 0.12.6 or later.

If you have further questions please open a GitHub Issue, or create a case with the AWS Support Center.

This is a restatement of the notice published in the Amazon Kinesis Data Streams Developer Guide

Release Notes

0.15.12

0.15.11

0.15.10

0.15.9

0.15.8

0.15.7

0.15.6

0.15.5

0.15.4

0.15.3

0.15.2

0.15.1

0.15.0

0.14.13

0.14.12

0.14.11

0.14.10

0.14.9

0.14.8

0.14.7

0.14.6

0.14.5

0.14.4

0.14.3

0.14.2

0.14.1

0.14.0

0.13.1

0.13.0

0.12.11

Java

Older release notes moved to CHANGELOG.md

Supported Platforms and Languages

The KPL is written in C++ and runs as a child process to the main user process. Precompiled native binaries are bundled with the Java release and are managed by the Java wrapper.

The Java package should run without the need to install any additional native libraries on the following operating systems:

Note the release is 64-bit only.

Sample Code

A sample java project is available in java/amazon-kinesis-sample.

Compiling the Native Code

Rather than compiling from source, Java developers are encouraged to use the KPL release in Maven, which includes pre-compiled native binaries for Linux, macOS.

To build the native components and bundle them into the jar, you can run the ./bootstrap.sh which will download the dependencies, build them, then build the native binaries, bundle them into the java resources folder, and then build the java packages. This must be done on the platform you are planning to execute the jars on.

Using the Java Wrapper with the Compiled Native Binaries

There are two options. You can either pack the binaries into the jar like we did for the official release, or you can deploy the native binaries separately and point the java code at it.

Pointing the Java wrapper at a Custom Binary

The KinesisProducerConfiguration class provides an option setNativeExecutable(String val). You can use this to provide a path to the kinesis_producer[.exe] executable you have built. You have to use backslashes to delimit paths on Windows if giving a string literal.