`tap-cloudwatch`

CloudWatch tap for extracting log data from AWS Cloudwatch Logs Insights API.

Capabilities

catalog
state
discover
about
stream-maps
schema-flattening

Settings

Setting	Required	Default	Description
aws_access_key_id	False	None	The access key for your AWS account.
aws_secret_access_key	False	None	The secret key for your AWS account.
aws_session_token	False	None	The session key for your AWS account. This is only needed when you are using temporary credentials.
aws_profile	False	None	The AWS credentials profile name to use. The profile must be configured and accessible.
aws_endpoint_url	False	None	The complete URL to use for the constructed client.
aws_region_name	False	None	The AWS region name (e.g. us-east-1)
start_date	True	None	The earliest record date to sync
end_date	False	None	The last record date to sync. This tap uses a 5 minute buffer to allow Cloudwatch logs to arrive in full. If you request data from current time it will automatically adjust your end_date to now - 5 mins.
log_group_name	True	None	The log group on which to perform the query.
query	True	None	The query string to use. For more information, see CloudWatch Logs Insights Query Syntax.
batch_increment_s	False	3600	The size of the time window to query by, default 3,600 seconds (i.e. 1 hour). If the result set for a batch is greater than the max limit of 10,000 records then the tap will query the same window again where >= the most recent record received. This means that the same data is potentially being scanned >1 times but < 2 times, depending on the amount the results set went over the 10k max. For example a batch window with 15k records would scan the 15k once, receiving 10k results, then scan ~5k again to get the rest. The net result is the same data was scanned ~1.5 times for that batch. To avoid this you should set the batch window to avoid exceeding the 10k limit.
stream_maps	False	None	Config object for stream maps capability. For more information check out Stream Maps.
stream_map_config	False	None	User-defined config values to be used within map expressions.
flattening_enabled	False	None	'True' to enable schema flattening and automatically expand nested properties.
flattening_max_depth	False	None	The max depth to flatten schemas.

A full list of supported settings and capabilities is available by running: tap-cloudwatch --about

Implementation Details

The tap always leaves a 5 minute buffer from realtime to handle any late or out of order logs on the Cloudwatch side to guarantee all data is replicated. Challenges related to this were first observed and discussed in https://github.com/MeltanoLabs/tap-cloudwatch/issues/25. It means that if you run the tap with no end_date configured it will attempt to retrieve data up until current time minus 5 mins.
Currently the tap uses a limit of 20 queries at a time. It sends a start_query API call then goes back to retrieve the data later once the query has completed.

Configure using environment variables

This Singer tap will automatically import any environment variables within the working directory's .env if the --config=ENV is provided, such that config values will be considered if a matching environment variable is set either in the terminal context or in the .env file.

Source Authentication and Authorization

Usage

You can easily run tap-cloudwatch by itself or in a pipeline using Meltano.

Executing the Tap Directly

tap-cloudwatch --version
tap-cloudwatch --help
tap-cloudwatch --config CONFIG --discover > ./catalog.json

Developer Resources

Follow these instructions to contribute to this project.

Initialize your Development Environment

pipx install poetry
poetry install

Create and Run Tests

Create tests within the tap_cloudwatch/tests subfolder and then run:

poetry run tox -e pytest

Coverage reports are generated at tap_cloudwatch/tests/codecoverage/.

You can also test the tap-cloudwatch CLI interface directly using poetry run:

poetry run tap-cloudwatch --help

Testing with Meltano

Note: This tap will work in any Singer environment and does not require Meltano. Examples here are for convenience and to streamline end-to-end orchestration scenarios.

Next, install Meltano (if you haven't already) and any needed plugins:

# Install meltano
pipx install meltano
# Initialize meltano within this directory
cd tap-cloudwatch
meltano install

Now you can test and orchestrate using Meltano:

# Test invocation:
meltano invoke tap-cloudwatch --version
# OR run a test `elt` pipeline:
meltano elt tap-cloudwatch target-jsonl

SDK Dev Guide

See the dev guide for more instructions on how to use the SDK to develop your own taps and targets.

Further Features

Using create_export_task to efficiently bulk export to S3 then read that data.

MeltanoLabs / tap-cloudwatch

readme