amazon-archives / logstash-input-dynamodb

This input plugin for Logstash scans a specified DynamoDB table and then reads changes to a DynamoDB table from the associated DynamoDB Stream.This gem is a Logstash plugin required to be installed on top of the Logstash core pipeline. This gem is not a stand-alone program.
Apache License 2.0
105 stars 27 forks source link

Logstash Plugin for Amazon DynamoDB

The Logstash plugin for Amazon DynamoDB gives you a nearly real-time view of the data in your DynamoDB table. The Logstash plugin for DynamoDB uses DynamoDB Streams to parse and output data as it is added to a DynamoDB table. After you install and activate the Logstash plugin for DynamoDB, it scans the data in the specified table, and then it starts consuming your updates using Streams and then outputs them to Elasticsearch, or a Logstash output of your choice.

Logstash is a data pipeline service that processes data, parses data, and then outputs it to a selected location in a selected format. Elasticsearch is a distributed, full-text search server. For more information about Logstash and Elasticsearch, go to https://www.elastic.co/products/elasticsearch.

The following sections walk you through the process to:

  1. Create a DynamoDB table and enable a new stream on the table.
  2. Download, build, and install the Logstash plugin for DynamoDB.
  3. Configure Logstash to output to Elasticsearch and the command line.
  4. Run the Logstash plugin for DynamoDB.
  5. Test Logstash by adding DynamoDB items to the table.

When this process is finished, you can search your data in the Elasticsearch cluster.

Prerequisites

The following items are required to use the Logstash plugin for Amazon DynamoDB:

  1. Amazon Web Services (AWS) account with DynamoDB
  2. A running Elasticsearch cluster—To download Elasticsearch, go to https://www.elastic.co/products/elasticsearch.
  3. Logstash—To download Logstash, go to https://github.com/awslabs/logstash-input-dynamodb.
  4. JRuby—To download JRuby, go to http://jruby.org/download.
  5. Git—To download Git, go to http://git-scm.com/downloads
  6. Apache Maven—To get Apache Maven, go to http://maven.apache.org/.

Before You Begin: Create a Source Table

In this step, you will create a DynamoDB table with DynamoDB Streams enabled. This will be the source table and writes to this table will be processed by the Logstash plugin for DynamoDB.

To create the source table

  1. Open the DynamoDB console at https://console.aws.amazon.com/dynamodb/.
  2. Choose Create Table.
  3. On the Create Table page, enter the following settings:
    1. Table Name — SourceTable
    2. Primary Key Type — Hash
    3. Hash attribute data type — Number
    4. Hash Attribute Name — Id
    5. Choose Continue.
  4. On the Add Indexes page, choose Continue. You will not need any indexes for this exercise.
  5. On the Provisioned Throughput page, choose Continue.
  6. On the Additional Options page, do the following:
    1. Select Enable Streams, and then set the View Type to New and Old Images.
    2. Clear Use Basic Alarms. You will not need alarms for this exercise.
    3. When you are ready, choose Continue.
  7. On the Summary page, choose Create.

The source table will be created within a few minutes.

Setting Up the Logstash Plugin for Amazon DynamoDB

To use the Logstash plugin for DynamoDB, you need to build, install, run the plugin, and then you can test it.

IMPORTANT: in order to successfully build and install Logstash, you must have previously installed MAVEN to satisfy jar dependencies, and JRUBY to build and run the logstash gem.

To build the Logstash plugin for DynamoDB

At the command prompt, change to the directory where you want to install the Logstash plugin for DynamoDB and demo project.

In the directory where you want the Git project, clone the Git project:

git clone https://github.com/awslabs/logstash-input-dynamodb.git

Install the Bundler gem by typing the following:

jruby -S gem install bundler

NOTE: The jruby -S syntax ensures that our gem is installed with jruby and not ruby

The Bundler gem checks dependencies for Ruby gems and installs them for you.

To install the dependencies for the Logstash plugin for DynamoDB, type the following command:

jruby -S bundle install

To build the gem, type the following command:

jruby -S gem build logstash-input-dynamodb.gemspec

To install the gem, in the logstash-dynamodb-input folder type:

jruby -S gem install --local logstash-input-dynamodb-1.0.0-java.gem

To install the Logstash plugin for DynamoDB

Now that you have built the plugin gem, you can install it.

Change directories to your local Logstash directory.

In the Logstash directory, open the Gemfile file in a text editor and add the following line.

gem "logstash-input-dynamodb"

To install the plugin, in your logstash folder type the command:

bin/plugin install --no-verify logstash-input-dynamodb

To list all the installed plugins type the following command:

bin/plugin list

If the logstash-output-elasticsearch or logstash-output-stdout plugins are not listed you need to install them. For instructions on installing plugins, go to the Working with Plugins page in the Logstash documentation.

Running the Logstash Plugin for Amazon DynamoDB

NOTE: First, make sure you have Enabled Streams (see above) for your DynamoDB table(s) before running logstash. Logstash for DynamoDB requires that each table you are logging from have a streams enabled to work.

In the local Logstash directory create a logstash-dynamodb.conf file with the following contents:

input { 
    dynamodb{
      endpoint => "dynamodb.us-east-1.amazonaws.com" 
      streams_endpoint => "streams.dynamodb.us-east-1.amazonaws.com" 
      view_type => "new_and_old_images" 
      aws_access_key_id => "<access_key_id>" 
      aws_secret_access_key => "<secret_key>" 
      table_name => "SourceTable"
  }
} 
output { 
    elasticsearch {
      host => localhost 
    } 
    stdout { } 
}

Important

This is an example configuration. You must replace <access_key_id> and <secret_key> with your access key and secret key. If you have credentials saved in a credentials file, you can omit these configuration values.

To run logstash type:

bin/logstash -f logstash-dynamodb.conf

Logstash should successfully start and begin indexing the records from your DynamoDB table.

You can also change the other configuration options to match your particular use case.

You can also configure the plugin to index multiple tables by adding additional dynamodb { } sections to the input section.

The following table shows the configuration values.

Setting Description

Settings Id Description
table_name The name of the table to index. This table must exist.
endpoint The DynamoDB endpoint to use. If you are running DynamoDB on your computer, use http://localhost:8000 as the endpoint.
streams_endpoint The name of a checkpoint table. This does not need to exist prior to plugin activation.
view_type The view type of the DynamoDB stream. ("new_and_old_images", "new_image", "old_image", "keys_only" Note: these must match the settings for your table's stream configured in the DynamoDB console.)
aws_access_key_id Your AWS access key ID. This is optional if you have credentials saved in a credentials file. Note: If you are running DynamoDB on your computer, this ID must match the access key ID that you used to create the table. If it does not match, the Logstash plugin will fail because DynamoDB partitions data by access key ID and region.
aws_secret_access_key Your AWS access key ID. Your AWS access key ID. This is optional if you have credentials saved in a credentials file.
perform_scan A boolean flag to indicate whether or not Logstash should scan the entire table before streaming new records. Note: Set this option to false if your are restarting the Logstash plugin.
checkpointer A string that uniquely identifies the KCL checkpointer name and CloudWatch metrics name. This is used when one worker leaves a shard so that another worker knows where to start again.
publish_metrics Boolean option to publish metrics to CloudWatch using the checkpointer name.
perform_stream Boolean option to not automatically stream new data into Logstash from DynamoDB streams.
read_ops Number of read operations per second to perform when scanning the specified table.
number_of_scan_threads Number of threads to use when scanning the specified table.
number_of_write_threads Number of threads to write to the Logstash queue when scanning the table.
log_format Log transfer format. "plain" - Returns the object as a DynamoDB object. "json_drop_binary" - Translates the item format to JSON and drops any binary attributes. "json_binary_as_text" - Translates the item format to JSON and represents any binary attributes as 64-bit encoded binary strings. For more information, see the JSON Data Format topic in the DynamoDB documentation.

Testing the Logstash Plugin for Amazon DynamoDB

The Logstash plugin for DynamoDB starts scanning the DynamoDB table and indexing the table data when you run it. As you insert new records into the DynamoDB table, the Logstash plugin consumes the new records from DynamoDB streams to continue indexing.

To test this, you can add items to the DynamoDB table in the AWS console, and view the output (stdout) in the command prompt window. The items are also inserted into Elasticsearch and indexed for searching.

To test the Logstash plugin for DynamoDB

Open the DynamoDB console at https://console.aws.amazon.com/dynamodb/.

In the list of tables, open (double-click) SourceTable.

Choose New Item, add the following data, and then choose PutItem:

Id—1 Message—First item

Repeat the previous step to add the following data items:

Id—2 and Message—Second item Id—3 and Message—Third item

Return to the command-prompt window and verify the Logstash output (it should have dumped the logstash output for each item you added to the console).

(Optional) Go back to the SourceTable in us-east-1 and do the following:

Update item 2. Set the Message to Hello world! Delete item 3.

Go to the command-prompt window and verify the data output.

You can now search the DynamoDB items in Elasticsearch.

For information about accessing and searching data in Elasticsearch, see the Elasticsearch documentation.