LambdaCron is a serverless cron tool. It provides a way to run scheduled tasks on the AWS cloud defined in YAML and managed by a command line tool (LambdaCron CLI). Tasks are scheduled using the same syntax for expressions as Linux crontab.
Traditionally, to run scheduled tasks you need set up cron jobs in the server where you want them to run. This doesn't make sense when building a serverless architecture, where servers are transparent to users. In order to solve this AWS provides CloudWatch Events, which allow you to run scheduled events (called rules) to invoke AWS services in a cron-like way. While it is a useful tool, it is very different from the traditional way to manage and run cron jobs and it has some serious limitations.
LambdaCron fills in the gap by providing a user friendly way to manage serverless cron jobs just like cron. With LambdaCron you define each of your tasks in an independent YAML file. Once you tasks are defined you can manage them using the command line tool without the need to access to the AWS console.
LambdaCron offers 4 different types of tasks:
Currently LambdaCron integrates with HTTP requests and 3 AWS services. It is ready be extended for other services and, in general, it is ready to reach any service available by an API.
LambdaCron provides a CLI tool that allows management of your cron tasks without access to the AWS Console. It also allows creation of multiple environments with different settings.
Custom settings for environments are set in a YAML-formatted configuration file located in the user home directory, and must be named ~/.lambda-cron.yml.
There are 3 levels of preferences for settings:
Highest level of preference is Environment, followed by Global and finally Default. Each option in the settings can set the value from different levels. Higher level of preference overwrite lower levels.
Each environment is defined with a root key in the YAML, while global settings are identified with the key global.
Options available:
Name of the bucket where Lambda function code will be hosted and tasks stored.
bucket: 'my-custom-bucket-name'
You can use the macro, {environment}, in the string for the bucket, and it will be replaced by the environment name.
Default: lambda-cron-{environment}
The bucket will have two folders:
Frequency at which the Lambda function will execute to run tasks. It indicates the frequency in minutes OR hours with an integer number. It is specified with one of the following parameters:
every:
minutes: 5
Default: every hour.
More info for frequency.
Alarm can be set up using CloudWatch metrics. It uses the following parameters:
alarm:
enabled: True
email: my-mailing-list@email.com
Default: disabled.
It enables/disables the cron (CloudWatch event).
enabled: True
Default: enabled.
global:
bucket: 'my-project-cron-{environment}'
prod:
alarm:
enabled: True
email: prod-alerts@domain.com
every:
minutes: 5
staging:
every:
minutes: 5
dev:
enabled: False
The settings for each environment will be:
prod:
staging:
dev:
The LambdaCron CLI uses the AWS CLI, and translates every command into an AWS CLI command. The AWS account used should be configured for AWS CLI access. The LambdaCron CLI allows different AWS CLI profiles to be specified.
The following is the list of commands available.
Create new LambdaCron environment in the AWS account.
Parameters:
Update new settings for the environment.
Parameters:
Enable LambdaCron to run.
Parameters:
Disable LambdaCron, and it won't run until it is enabled (#start command) again.
Parameters:
Invoke Lambda function cron manually.
Parameters:
Delete LambdaCron environment from the AWS account.
Parameters:
Note: The bucket must be empty before it can be deleted.
Upload tasks to S3 bucket to be run with LambdaCron. It will sync the directory with S3, including deleting tasks have been deleted from the local directory.
Parameters:
Validate a tasks checking if they match with the schema. It can validate a task from a file or a set of tasks in a directory.
Parameters:
Tasks are defined in YAML files (each task in an independent file) and stored in S3. A task must follow the JSON schema provided in this repo: schema.
All tasks must contain the following keys and values:
Each type of task has its own set of required keys as described in the following section.
It sends a message to an AWS SQS queue. The task definition must contains following keys:
name: 'Send performance report every morning'
expression: '0 9 * * *'
task:
type: 'queue'
QueueName: 'my-scheduler-queue'
MessageBody:
name: 'Performance report'
type: 'report'
sql: 'SELECT ....'
recipients:
emails: ....
Message is sent using boto3 SQS.Queue.send_message All parameters of the function will be supported soon.
It invokes an AWS lambda functions. The task definition must contain the following keys:
name: 'Run ETL process every hour'
expression: '0 * * * *'
task:
type: 'lambda'
FunctionName: 'run-etl-process-prod'
InvokeArgs:
source: 's3://my-data-source/performance'
output: 's3://my-data-output/performance'
The function is invoked using boto3 Lambda.Client.invoke_async
It submits AWS Batch jobs. The task definition must contain the following keys:
name: 'Enrich new stats every hour'
expression: '0 * * * *'
task:
type: 'batch'
jobName: 'enrich-stats'
jobDefinition: 'enrich-stats-definition:1'
jobQueue: 'jobs_high_priority'
It is a wrapper for boto3 Batch.Client.submit_job. All parameters for the method can be set in the task definition.
It sends an HTTP request (GET or POST). This task allows to reach any service that provides an API. The task definition must contain the following keys:
name: 'health check every hour'
expression: '0 * * * *'
task:
type: 'http'
method: 'get'
request:
url: 'http://healthcheck.my-domain.com'
params:
service: 'lambda'
It is a wrapper over Requests. All HTTP methods will be supported soon.
It executes the SQL query. The task definition must contain the following keys:
name: 'get high scores every fifteen minutes'
expression: '0 15 * * *'
task:
type: 'athena'
QueryString: 'SELECT Username, HighScore FROM Database.UserTable WHERE HighScore > 1000'
ResultConfiguration:
OutputLocation: 'http://scores.my-app.s3.amazonaws.com'
It is a wrapper for boto3 Athena.Client.start_query_execution. All parameters for the method can be set in the task definition.
All tasks scheduled to run between the current event and the next event will be run immediately.
Example: LambdaCron runs every hour ('0 '), tasks '0 1 ' and '58 1 ' will run at the same time.
LambdaCron will execute a task at most once for each invocation. This can result in a task being run fewer times than it's cron expression implies.
Example: If LambdaCron runs every hour ('0 '), a task '/15 ' will only run once an hour. If LambdaCron runs every minute (' '), a task '/15 *' will only run four times an hour. You can set up LambdaCron to run more frequently than an hour if you need a task to be run more frequently.
Events are based on AWS CloudWatch Events. You can learn about them in the Scheduled Events documentation:
Be aware of this.
LambdaCron is based 100% on AWS cloud.
Important! The tool is not available in pip yet. If you want to try it, check Development
$ pip install lambda-cron
Create your first environment (called 'test') with default settings:
$ lambda-cron create --environment=test --create-bucket
If you want to set some custom settings, create the setting file in the home directory of the user who is running the tool.
~/.lambda-cron.yml
For help:
$ lambda-cron --help
or for each command:
$ lambda-cron create --help
To start working with LambdaCron you should clone the project, create a virtualenv (optional) and install dependencies:
$ git clone https://github.com/MediaMath/lambda-cron.git
$ cd lambda-cron
$ virtualenv venv
$ source venv/bin/activate
$ pip install -r requirements-dev.txt
$ ./lambda_cron/lambda-cron --help
Contributions are welcome. You can find open issues with some features and improvements that would be good to have in LambdaCron.
Before contribute we encourage to take a look at the following tips provided by GitHub