MediaMath / lambda-cron

LambdaCron - serverless cron tool
Apache License 2.0
25 stars 4 forks source link
aws aws-batch aws-lambda aws-sqs cron cronjob crontab ignore-list lambda

Build Status

LambdaCron

LambdaCron

LambdaCron is a serverless cron tool. It provides a way to run scheduled tasks on the AWS cloud defined in YAML and managed by a command line tool (LambdaCron CLI). Tasks are scheduled using the same syntax for expressions as Linux crontab.

Traditionally, to run scheduled tasks you need set up cron jobs in the server where you want them to run. This doesn't make sense when building a serverless architecture, where servers are transparent to users. In order to solve this AWS provides CloudWatch Events, which allow you to run scheduled events (called rules) to invoke AWS services in a cron-like way. While it is a useful tool, it is very different from the traditional way to manage and run cron jobs and it has some serious limitations.

LambdaCron fills in the gap by providing a user friendly way to manage serverless cron jobs just like cron. With LambdaCron you define each of your tasks in an independent YAML file. Once you tasks are defined you can manage them using the command line tool without the need to access to the AWS console.

LambdaCron offers 4 different types of tasks:

Currently LambdaCron integrates with HTTP requests and 3 AWS services. It is ready be extended for other services and, in general, it is ready to reach any service available by an API.

LambdaCron CLI

LambdaCron provides a CLI tool that allows management of your cron tasks without access to the AWS Console. It also allows creation of multiple environments with different settings.

Settings

Custom settings for environments are set in a YAML-formatted configuration file located in the user home directory, and must be named ~/.lambda-cron.yml.

There are 3 levels of preferences for settings:

Highest level of preference is Environment, followed by Global and finally Default. Each option in the settings can set the value from different levels. Higher level of preference overwrite lower levels.

Each environment is defined with a root key in the YAML, while global settings are identified with the key global.

Options available:

bucket

Name of the bucket where Lambda function code will be hosted and tasks stored.

bucket: 'my-custom-bucket-name'

You can use the macro, {environment}, in the string for the bucket, and it will be replaced by the environment name.

Default: lambda-cron-{environment}

The bucket will have two folders:

every (frequency)

Frequency at which the Lambda function will execute to run tasks. It indicates the frequency in minutes OR hours with an integer number. It is specified with one of the following parameters:

every:
    minutes: 5

Default: every hour.

More info for frequency.

alarm

Alarm can be set up using CloudWatch metrics. It uses the following parameters:

alarm:
    enabled: True
    email: my-mailing-list@email.com

Default: disabled.

enabled

It enables/disables the cron (CloudWatch event).

enabled: True

Default: enabled.

Example

global:
  bucket: 'my-project-cron-{environment}'

prod:
  alarm:
    enabled: True
    email: prod-alerts@domain.com
  every:
    minutes: 5

staging:
  every:
    minutes: 5

dev:
  enabled: False

The settings for each environment will be:

Commands

The LambdaCron CLI uses the AWS CLI, and translates every command into an AWS CLI command. The AWS account used should be configured for AWS CLI access. The LambdaCron CLI allows different AWS CLI profiles to be specified.

The following is the list of commands available.

create

Create new LambdaCron environment in the AWS account.

Parameters:

update

Update new settings for the environment.

Parameters:

start

Enable LambdaCron to run.

Parameters:

stop

Disable LambdaCron, and it won't run until it is enabled (#start command) again.

Parameters:

invoke

Invoke Lambda function cron manually.

Parameters:

delete

Delete LambdaCron environment from the AWS account.

Parameters:

Note: The bucket must be empty before it can be deleted.

upload-tasks

Upload tasks to S3 bucket to be run with LambdaCron. It will sync the directory with S3, including deleting tasks have been deleted from the local directory.

Parameters:

validate

Validate a tasks checking if they match with the schema. It can validate a task from a file or a set of tasks in a directory.

Parameters:

Tasks

Tasks are defined in YAML files (each task in an independent file) and stored in S3. A task must follow the JSON schema provided in this repo: schema.

All tasks must contain the following keys and values:

Each type of task has its own set of required keys as described in the following section.

Queue task

It sends a message to an AWS SQS queue. The task definition must contains following keys:

name: 'Send performance report every morning'
expression: '0 9 * * *'
task:
  type: 'queue'
  QueueName: 'my-scheduler-queue'
  MessageBody:
    name: 'Performance report'
    type: 'report'
    sql: 'SELECT ....'
    recipients:
      emails: ....

Message is sent using boto3 SQS.Queue.send_message All parameters of the function will be supported soon.

Lambda task

It invokes an AWS lambda functions. The task definition must contain the following keys:

name: 'Run ETL process every hour'
expression: '0 * * * *'
task:
  type: 'lambda'
  FunctionName: 'run-etl-process-prod'
  InvokeArgs:
    source: 's3://my-data-source/performance'
    output: 's3://my-data-output/performance'

The function is invoked using boto3 Lambda.Client.invoke_async

Batch task

It submits AWS Batch jobs. The task definition must contain the following keys:

name: 'Enrich new stats every hour'
expression: '0 * * * *'
task:
  type: 'batch'
  jobName: 'enrich-stats'
  jobDefinition: 'enrich-stats-definition:1'
  jobQueue: 'jobs_high_priority'

It is a wrapper for boto3 Batch.Client.submit_job. All parameters for the method can be set in the task definition.

HTTP task

It sends an HTTP request (GET or POST). This task allows to reach any service that provides an API. The task definition must contain the following keys:

name: 'health check every hour'
expression: '0 * * * *'
task:
  type: 'http'
  method: 'get'
  request:
    url: 'http://healthcheck.my-domain.com'
    params:
      service: 'lambda'

It is a wrapper over Requests. All HTTP methods will be supported soon.

Athena task

It executes the SQL query. The task definition must contain the following keys:

name: 'get high scores every fifteen minutes'
expression: '0 15 * * *'
task:
  type: 'athena'
  QueryString: 'SELECT Username, HighScore FROM Database.UserTable WHERE HighScore > 1000'
  ResultConfiguration:
    OutputLocation: 'http://scores.my-app.s3.amazonaws.com'

It is a wrapper for boto3 Athena.Client.start_query_execution. All parameters for the method can be set in the task definition.

Frequency

Execution time

All tasks scheduled to run between the current event and the next event will be run immediately.

Example: LambdaCron runs every hour ('0 '), tasks '0 1 ' and '58 1 ' will run at the same time.

Task frequency

LambdaCron will execute a task at most once for each invocation. This can result in a task being run fewer times than it's cron expression implies.

Example: If LambdaCron runs every hour ('0 '), a task '/15 ' will only run once an hour. If LambdaCron runs every minute (' '), a task '/15 *' will only run four times an hour. You can set up LambdaCron to run more frequently than an hour if you need a task to be run more frequently.

Frequecy and Precision

Events are based on AWS CloudWatch Events. You can learn about them in the Scheduled Events documentation:

Be aware of this.

Requirements

LambdaCron is based 100% on AWS cloud.

Getting Started

Important! The tool is not available in pip yet. If you want to try it, check Development

Install

$ pip install lambda-cron

Usage

Create your first environment (called 'test') with default settings:

$ lambda-cron create --environment=test --create-bucket

If you want to set some custom settings, create the setting file in the home directory of the user who is running the tool.

For help:

$ lambda-cron --help

or for each command:

$ lambda-cron create --help

Development

To start working with LambdaCron you should clone the project, create a virtualenv (optional) and install dependencies:

$ git clone https://github.com/MediaMath/lambda-cron.git
$ cd lambda-cron
$ virtualenv venv
$ source venv/bin/activate
$ pip install -r requirements-dev.txt
$ ./lambda_cron/lambda-cron --help

Contributing

Contributions are welcome. You can find open issues with some features and improvements that would be good to have in LambdaCron.

Before contribute we encourage to take a look at the following tips provided by GitHub