awslabs / amazon-kinesis-client-ruby

A Ruby interface for the Amazon Kinesis Client Library. Allows developers to easily create robust application to process Amazon Kinesis streams in Ruby.
Apache License 2.0
146 stars 55 forks source link

Amazon Kinesis Client Library for Ruby

This package provides an interface to the Amazon Kinesis Client Library's (KCL) MultiLangDaemon for the Ruby language. Developers can use the Amazon KCL to build distributed applications that process streaming data reliably at scale. The Amazon KCL takes care of many of the complex tasks associated with distributed computing, such as load-balancing across multiple instances, responding to instance failures, checkpointing processed records, and reacting to changes in stream volume. This package wraps and manages the interaction with the MultiLangDaemon which is part of the Amazon KCL for Java so that developers can focus on implementing their record processor executable. A record processor in Ruby typically looks something like:

#! /usr/bin/env ruby

require 'aws/kclrb'

class SampleRecordProcessor < Aws::KCLrb::V2::RecordProcessorBase
  def init_processor(initialize_input)
    # initialize
  end

  def process_records(process_records_input)
    # process batch of records
  end

  def lease_lost(lease_lost_input)
    # lease was lost, cleanup
  end

  def shard_ended(shard_ended_input)
    # shard has ended, cleanup
  end

  def shutdown_requested(shutdown_requested_input)
    # shutdown has been requested
  end
end

if __FILE__ == $0
  # Start the main processing loop
  record_processor = SampleRecordProcessor.new
  driver = Aws::KCLrb::KCLProcess.new(record_processor)
  driver.run
end

Before You Get Started

Before running the samples, you'll want to make sure that your environment is configured to allow the samples to use your AWS Security Credentials.

By default the samples use the DefaultAWSCredentialsProviderChain so you'll want to make your credentials available to one of the credentials providers in that provider chain. There are several ways to do this such as providing a ~/.aws/credentials file, or if you're running on Amazon EC2, you can associate an IAM role with your instance with appropriate access.

For questions regarding Amazon Kinesis Service and the client libraries please check the official documentation as well as the Amazon Kinesis Forums.

Running the Sample

Using the Amazon KCL for Ruby package requires the MultiLangDaemon which is provided by the Amazon KCL for Java. Rake tasks are provided to start the sample application(s) and download all the required dependencies.

The sample application consists of two components:

The following defaults are used in the sample application:

Running the Data Producer

To run the data producer, run the following commands:

    cd samples
    rake run_producer

Notes

Running the Data Processor

To run the data processor, run the following commands:

    cd samples
    rake run properties_file=sample.properties

Notes

Cleaning Up

This sample application creates a real Amazon Kinesis stream and sends real data to it, and create a real DynamoDB table to track the Amazon KCL application state, thus potentially incurring AWS costs. Once done, you can log in to AWS management console and delete these resources. Specifically, the sample application will create in your default AWS region

Running on Amazon EC2

Running on Amazon EC2 is simple. Assuming you are already logged into an Amazon EC2 instance running Amazon Linux, the following steps will prepare your environment for running the sample application. Note the version of Java that ships with Amazon Linux can be found at /usr/bin/java and should be 1.7 or greater.

    # install some prerequisites if missing
    sudo yum install gcc patch git ruby rake rubygems ruby-devel
    # install the AWS Ruby SDK (pre-requisuite for producer)
    sudo gem install aws-sdk aws-kclrb
    # clone the git repository to work with the samples
    git clone https://github.com/awslabs/amazon-kinesis-client-ruby.git kclrb
    # run the sample
    cd kclrb/samples
    rake run_producer
    # ... and in another terminal
    rake run properties_file=sample.properties

Under the Hood - What You Should Know about Amazon KCL's MultiLangDaemon

Amazon KCL for Ruby uses Amazon KCL for Java internally. We have implemented a Java-based daemon, called the MultiLangDaemon that does all the heavy lifting. Our approach has the daemon spawn the user-defined record processor script/program as a sub-process. The MultiLangDaemon communicates with this sub-process over standard input/output using a simple protocol, and therefore the record processor script/program can be written in any language.

At runtime, there will always be a one-to-one correspondence between a record processor, a child process, and an Amazon Kinesis Shard. The MultiLangDaemon will make sure of that, without any need for the developer to intervene.

In this release, we have abstracted these implementation details away and exposed an interface that enables you to focus on writing record processing logic in Ruby. This approach enables Amazon KCL to be language agnostic, while providing identical features and similar parallel processing model across all languages.

See Also

Release Notes

Release 2.1.1 (February 21, 2023)

Release 2.1.0 (January 12, 2023)

Release 2.0.0 (February 26, 2019)

Release 1.0.1 (January 19, 2017)

Release 1.0.0 (December 30, 2014)

License

This library is licensed under the Apache 2.0 License.