collectivemedia / celos

Scriptable scheduler for periodical Hadoop workflows
Apache License 2.0
22 stars 9 forks source link

Celos: A Scriptable Scheduler for Periodical Hadoop Workflows

:toc: macro :toclevels: 3 :toc-title:

'''

toc::[]

'''

== Overview

image::etc/images/celos-ui.png[Celos UI]

The Celos UI

=== Oozie Overview

Celos works in conjunction with the link:https://oozie.apache.org/[_Apache Oozie_] execution engine, so we'll quickly describe Oozie before diving into Celos proper.

Oozie provides services such as distribution of jobs across a cluster, log aggregation, and command-line and Web UI tools for inspecting running jobs.

To run a job in Oozie it is packaged into a workflow directory and put into HDFS, as the following diagram shows:

image::etc/images/oozie.png[Oozie workflow directory]

Oozie requires a small XML file that describes what the job should do (in this case call a Java main with some arguments), and then runs it on a processing node in the cluster. Artefacts in the lib directory are automatically placed on the classpath. Arguments (such as inputPath and outputPath in the above example) can be passed to the job.

Oozie XML files support link:https://oozie.apache.org/docs/4.2.0/WorkflowFunctionalSpec.html[many different kinds of actions] besides calling Java, such as manipulating HDFS, loading data into databases, or sending emails. This is mostly out of scope for this document, but in general, Celos workflows can use all features of Oozie.

You can view all currently running jobs in the Oozie Web UI, Hue:

image::etc/images/hue-index.png[Hue index]

You can also view details of a particular job in Hue:

image::etc/images/hue-details.png[Hue details]

=== Celos Concepts

==== Workflows

If we have the Oozie workflow directory to run in HDFS at /wordcount, hourly input data in /input/YYYY-MM-DD/HH00, and want to write output data to /output/YYYY-MM-DD/HH00, we can set up a simple Celos workflow with the ID wordcount like this:

[source,javascript] .... celos.defineWorkflow({ "id": "wordcount", "schedule": celos.hourlySchedule(), "schedulingStrategy": celos.serialSchedulingStrategy(), "trigger": celos.hdfsCheckTrigger("/input/${year}-${month}-${day}/${hour}00/_READY"), "externalService": celos.oozieExternalService({ "oozie.wf.application.path": "/wordcount/workflow.xml", "inputPath": "/input/${year}-${month}-${day}/${hour}00/", "outputPath": "/output/${year}-${month}-${day}/${hour}00/", }) }); ....

A Celos workflow always has an Oozie workflow (/wordcount/workflow.xml in this case) as its "meat".

If we were to receive data from two sources, say two datacenters in /input/nyc and /input/lax, we can define a helper function and use that to quickly define two workflows with the IDs wordcount-nyc and wordcount-lax:

[source,javascript] .... function defineWordCountWorkflow(dc) { celos.defineWorkflow({ "id": "wordcount-" + dc, "schedule": celos.hourlySchedule(), "schedulingStrategy": celos.serialSchedulingStrategy(), "trigger": celos.hdfsCheckTrigger("/input/" + dc + "/${year}-${month}-${day}/${hour}00/_READY"), "externalService": celos.oozieExternalService({ "oozie.wf.application.path": "/wordcount/workflow.xml", "inputPath": "/input/" + dc + "/${year}-${month}-${day}/${hour}00/", "outputPath": "/output/" + dc + "/${year}-${month}-${day}/${hour}00/", }) }); } defineWordCountWorkflow("nyc"); defineWordCountWorkflow("lax"); ....

Here's an overview over schedules, triggers, and scheduling strategies, described below:

image::etc/images/slots.png[Celos concepts]

==== Schedules

Each workflow has a schedule that determines the points in time (called slots) at which the workflow should run.

Celos supports cron-like schedules with <>:

[source,javascript] .... // A workflow using this schedule will run every hour. celos.cronSchedule("0 0 ?"); // A workflow using this schedule will run every day at midnight. celos.cronSchedule("0 0 0 ?"); // A workflow using this schedule will run every day at 5am. celos.cronSchedule("0 0 5 * ?"); ....

Another type of schedule is <>, which makes a workflow use the same schedule as another workflow. This is useful for setting up a downstream workflow that tracks an upstream workflow, without having to duplicate the schedule definition.

==== Triggers

For each slot of a workflow, a trigger is used to determine whether it's ready to run, or needs to wait.

===== Simple Triggers

Let's look at some commonly used simple triggers.

<> waits for a file or directory in HDFS:

[source,javascript] .... // A slot at time T will wait for the file /logs/YYYY-MM-DD/HH00/_READY in HDFS. celos.hdfsCheckTrigger("/logs/${year}-${month}-${day}/${hour}00/_READY"); ....

<> waits for the success of another workflow, allowing the definition of dependencies among workflows:

[source,javascript] .... // A slot at time T will wait until the slot at time T of // the workflow with the ID "workflow-foo" is successful. celos.successTrigger("workflow-foo") ....

<> waits until the current wallclock time is a given number of seconds after the slot's time:

[source,javascript] .... // A slot at time T will wait until the current time is one hour after T. celos.delayTrigger(60 * 60) ....

<> lets us offset another trigger a given number of seconds into the future or past.

[source,javascript] .... // A slot at time T will wait until the next hour's file is available in HDFS. celos.offsetTrigger(60 * 60, celos.hdfsCheckTrigger("/logs/${year}-${month}-${day}/${hour}00/_READY")); ....

===== Combined Triggers

We can also combine triggers with <>, <>, and <>:

[source,javascript] .... // A slot at time T will wait until both /input-a/YYYY-MM-DD/HH00/_READY // and /input-b/YYYY-MM-DD/HH00/_READY is in HDFS. celos.andTrigger(celos.hdfsCheckTrigger("/input-a/${year}-${month}-${day}/${hour}00/_READY"), celos.hdfsCheckTrigger("/input-b/${year}-${month}-${day}/${hour}00/_READY")); ....

[source,javascript] .... // A slot at time T will wait until the current hour's file, the next hour's file, // and the file for the hour after that are in HDFS. var hdfsCheck = celos.hdfsCheckTrigger("/logs/${year}-${month}-${day}/${hour}00/_READY"); celos.andTrigger(hdfsCheck, celos.offsetTrigger(60 60 1, hdfsCheck), celos.offsetTrigger(60 60 2, hdfsCheck)); ....

[source,javascript] .... // A slot at time T will be ready if, after one hour, the slot at time T // of the other workflow "workflow-bar" is not successful. // This can be used to send an alert for example. celos.andTrigger(celos.delayTrigger(60 * 60), celos.notTrigger(celos.successTrigger("workflow-bar")); ....

This last trigger should be used in conjunction with a <<celos.dependentSchedule,celos.dependentSchedule("workflow-bar")>>.

==== Scheduling Strategies

A workflow's scheduling strategy determines when and in which order the ready slots of the workflow should be run.

There's only one scheduling strategy at the moment, <>, which executes ready slots oldest first, with a configurable concurrency level.

[source,javascript] .... // A workflow using this scheduling strategy will run three slots in parallel. celos.serialSchedulingStrategy(3); ....

=== How it Works

The main data sources Celos uses are:

==== Workflows Directory

The workflows directory contains JavaScript files that define workflows.

It may look like this:

.... workflows/ wordcount.js some-other-workflow.js yet-another-workflow.js ....

==== State Database

The state database directory contains the state of each slot as a small JSON file.

.... db/ state/ wordcount-lax/ 2015-09-15/ 00:00:00.000Z 01:00:00.000Z 02:00:00.000Z ... wordcount-nyc/ 2015-09-15/ 00:00:00.000Z 01:00:00.000Z 02:00:00.000Z ... ....

An individual slot file in the state database, e.g. db/state/wordcount-lax/2015-09-15/01:00:00.000Z, looks like this:

.... { "status": "SUCCESS", "externalID": "0008681-150911205802478-oozie-oozi-W", "retryCount": 0 } ....

The status field records the state the slot is in.

The externalID field contains the Oozie ID of the corresponding Oozie workflow execution if the slot is running, successful, or failed (otherwise it's null).

The retryCount records how many times the slot has already been retried after failure.

==== Scheduler Step

On each scheduler step (typically triggered once per minute from cron), Celos evaluates all JavaScript files in the workflows directory, yielding a set of uniquely identified workflows.

Then, for each workflow, Celos fetches all slot files within a sliding window of 7 days before the current date from the state database.

Each slot is a state machine with the following states:

image::etc/images/states.png[Slot states]

Celos takes the following action, depending on the state of the slot:

[options="header"] |=== |State|Action |WAITING|Call the workflow's trigger to determine whether the slot is ready. If the trigger signals readiness, put the slot into the READY state. If the slot has been waiting for too long, put the slot into the WAIT_TIMEOUT state. Otherwise, keep the slot in the WAITING state. |READY|Pass the slot as a candidate for scheduling to the workflow's scheduling strategy. If the strategy chooses to execute the slot, submit it to Oozie, and put it into the RUNNING state. Otherwise, keep the slot in the READY state. |RUNNING|Ask Oozie for the status of the execution. If the slot is still executing, keep it in the RUNNING state. If the slot has succeeded, put it into the SUCCESS state. If the slot has failed, but there are retries left, put the slot into the WAITING state again. If the slot has failed, and there are no more retries left, put the slot into the FAILURE state. |SUCCESS|Do nothing. |FAILURE|Do nothing. |WAIT_TIMEOUT|Do nothing. |KILLED|Do nothing. |===

The state database contains additional information about slots that have been manually rerun with the <> HTTP API.

In the following example, the slots 2015-08-01T01:00Z and 2015-08-01T02:00Z of the workflow wordcount-nyc have been rerun. They are outside the sliding window, so the above scheduling algorithm would not look at the slots.

However, rerunning a slot touches an additional file in the rerun subdirectory of the state database, and slots for which such a file exists are fed into the scheduling algorithm in addition to the slots from the 7 day sliding window.

.... db/ state/ ... as above ... rerun/ wordcount-nyc/ 2015-08-01/ 01:00:00.000Z 02:00:00.000Z ....

Rerunning thus serves two purposes: besides the main use of rerunning a slot, it can also be used to backfill data, by using it to mark slots outside the sliding window that the scheduler should care about.

==== Further Directories

Celos has a defaults directory that contains JavaScript files that can be imported into a workflow JavaScript file with <>. Such defaults files are used for sharing global variables and utility functions.

Celos writes daily-rotating logs to a logs directory.

All directories (workflows, defaults, logs, and database) are configurable via <>.

=== Continuous Delivery of Workflows

Changing a workflow definition in Celos is as simple as updating the workflow JavaScript file and/or the Oozie workflow definition in HDFS. On the next scheduler step, Celos will pick up the changes.

Bundled with Celos comes a tool called Celos CI (see <> as well as link:samples/quickstart[]) that automates this process, and can be used in conjunction with GitHub and a CI server such as link:https://jenkins-ci.org/[Jenkins] for continuous delivery of Celos workflows.

For each group of related workflows, we have a GitHub repository and a Jenkins job that deploys the workflows on push to master using Celos CI. Celos CI copies the JavaScript files to the Celos host with SFTP, and uploads the Oozie workflow directory to HDFS.

image::etc/images/arch.png[Architecture]

=== Experience

As of September 2015, Celos has been in use at link:http://www.collective.com[Collective] for about two years, and is currently running all of our Hadoop processing (hundreds of individual workflows across dozens of projects).

Celos is productively used by people from different backgrounds, such as data science, operations, software engineering, and database administration, and has proven to be a welcome improvement on our previous Oozie coordinator-based scheduling.

We're proud that in two years of use, not a single bug in Celos has caused any downtime, which is attributable to the small codebase (about 2500 non-blank, non-comment lines of code for core Celos, as measured by link:http://cloc.sourceforge.net/[cloc] 1.56) and the rigorous test suite (hundreds of unit tests and an extensive integration test).

== Building & Running Celos

=== Prerequisites

You can probably get away with slightly older Hadoop and Oozie versions.

=== Building Celos

[source,shell] .... scripts/build.sh ....

This will build the following JARs:

=== Getting Started

Head over to link:samples/quickstart[samples/quickstart].

== Get in Touch

We'd love to help you try out and use Celos!

For now, please use the link:https://github.com/collectivemedia/celos/issues[Issue Tracker] if you have questions or comments.

=== People

Developers, developers, developers:

Head honcho: link:http://github.com/andry1[Chris Ingrassia]

== JavaScript API Reference

=== Workflows Reference

==== celos.defineWorkflow

===== Description

This is the main API call that registers a new workflow.

===== Syntax

[source,javascript] .... celos.defineWorkflow(options) ....

===== Parameters

The options argument is an object with the following fields:

[options="header"] |=== |Name|Type|Required|Description |id|String|Yes|The identifier string for the workflow, must be unique. |trigger|<<triggers-reference,Trigger>>|Yes|The trigger that determines data availability for the workflow. |schedule|<<schedules-reference,Schedule>>|Yes|The schedule that determines the points in time at which the workflow should run. |schedulingStrategy|<<scheduling-strategies-reference,SchedulingStrategy>>|Yes|The scheduling strategy that determines when and in which order ready slots should be run. |externalService|<<external-services-reference,ExternalService>>|Yes|The external service actually responsible for executing the job. |startTime|String (ISO 8601, UTC)|No|The date when the workflow should start executing (default: "1970-01-01T00:00Z"). |maxRetryCount|Number|No|The number of times a slot of this workflow should be automatically retried if it fails (default: 0). |waitTimeoutSeconds|Number|No|The number of seconds a workflow should stay waiting until it times out (default: Integer.MAX_VALUE (68 years)). |===

===== Examples

[source,javascript] .... celos.defineWorkflow({ "id": "my-workflow", "schedule": celos.hourlySchedule(), "schedulingStrategy": celos.serialSchedulingStrategy(), "trigger": celos.alwaysTrigger(), "externalService": celos.oozieExternalService({ "oozie.wf.application.path": "/my-workflow/workflow.xml", "param1": "Hello", "param2": "World" }) }); ....

==== celos.importDefaults

===== Description

Evaluates a file from the defaults directory in the current scope, so all variables and functions from the file become available in the current file.

===== Syntax

[source,javascript] .... celos.importDefaults(name) ....

===== Parameters

[options="header"] |=== |Name|Type|Required|Description |name|String|Yes|The name of the defaults file to import, without the ".js" suffix. |===

===== Examples

[source,javascript] .... // Loads the file foo.js from the defaults directory celos.importDefaults("foo"); ....

=== Triggers Reference

A trigger determines (for each point in time at which a workflow runs) whether the preconditions for running the workflow (such as data availability, or success of upstream workflows are met).

==== celos.hdfsCheckTrigger

===== Description

Makes a workflow wait for a file or directory in HDFS. Often used to wait for _READY or _SUCCESS files.

===== Syntax

[source,javascript] .... celos.hdfsCheckTrigger(path, fs?) ....

===== Parameters

[options="header"] |=== |Name|Type|Required|Description |path|String|Yes|The HDFS path to wait for. May include the variables ${year}, ${month}, ${day}, ${hour}, ${minute}, and ${second}, which will be replaced by the zero-padded values from the slot's scheduled time. |fs|String|No|The hdfs:// URI of the HDFS filesystem to use. If not specified, the value of the <> variable will be used. |===

===== Examples

[source,javascript] .... celos.hdfsCheckTrigger("/logs/${year}-${month}-${day}/${hour}-00/_READY"); ....

==== celos.successTrigger

===== Description

Makes a workflow wait for the success of another workflow at the same time. This is used to define dependencies among workflows.

===== Syntax

[source,javascript] .... celos.successTrigger(workflowID) ....

===== Parameters

[options="header"] |=== |Name|Type|Required|Description |workflowID|String|Yes|The ID of the other workflow to wait for. |===

===== Examples

[source,javascript] .... // A workflow using this trigger will run at time T only after the // workflow "bar" has succeeded at time T. celos.successTrigger("bar"); ....

==== celos.andTrigger

===== Description

Logical AND of nested triggers.

===== Syntax

[source,javascript] .... celos.andTrigger(trigger1, ..., triggerN) ....

===== Parameters

[options="header"] |=== |Name|Type|Required|Description |trigger1, ..., triggerN|<<triggers-reference,Trigger>>|No|The nested triggers. If no nested triggers are specified, the trigger is always ready. |===

===== Examples

[source,javascript] .... // Wait for the HDFS paths /foo and /bar celos.andTrigger(celos.hdfsCheckTrigger("/foo"), celos.hdfsCheckTrigger("/bar")); ....

==== celos.orTrigger

===== Description

Logical OR of nested triggers.

===== Syntax

[source,javascript] .... celos.orTrigger(trigger1, ..., triggerN) ....

===== Parameters

[options="header"] |=== |Name|Type|Required|Description |trigger1, ..., triggerN|<<triggers-reference,Trigger>>|No|The nested triggers. If no nested triggers are specified, the trigger is never ready. |===

===== Examples

[source,javascript] .... // Wait for the HDFS paths /foo or /bar celos.orTrigger(celos.hdfsCheckTrigger("/foo"), celos.hdfsCheckTrigger("/bar")); ....

==== celos.notTrigger

===== Description

Logical NOT of a nested trigger.

===== Syntax

[source,javascript] .... celos.notTrigger(trigger) ....

===== Parameters

[options="header"] |=== |Name|Type|Required|Description |trigger|<<triggers-reference,Trigger>>|Yes|The nested trigger to negate. |===

===== Examples

[source,javascript] .... // Wait until HDFS path /foo doesn't exist. celos.notTrigger(celos.hdfsCheckTrigger("/foo")); ....

==== celos.offsetTrigger

===== Description

Offset a nested trigger into the future or past.

===== Syntax

[source,javascript] .... celos.offsetTrigger(seconds, trigger) ....

===== Parameters

[options="header"] |=== |Name|Type|Required|Description |seconds|Number|Yes|The number of seconds to offset into the future (if positive) or past (if negative). |trigger|<<triggers-reference,Trigger>>|Yes|The nested trigger to offset. |===

===== Examples

[source,javascript] .... // Wait for this hour's and next hour's HDFS file. var trigger = celos.hdfsCheckTrigger("/${year}-${month}-${day}/${hour}-00/_READY"); celos.andTrigger(trigger, celos.offsetTrigger(60 * 60, trigger); ....

==== celos.delayTrigger

===== Description

Waits until a specified amount of time has passed between the slot's scheduled time and the current wallclock time.

===== Syntax

[source,javascript] .... celos.delayTrigger(seconds) ....

===== Parameters

[options="header"] |=== |Name|Type|Required|Description |seconds|Number|Yes|The number of seconds to wait. |===

===== Examples

[source,javascript] .... // Will become ready one hour after its scheduled time. celos.delayTrigger(60 * 60);

// Can also be used for e.g. alerting: will trigger if, after 1 hour, // workflow "foo" is not successful. celos.andTrigger(celos.delayTrigger(60 * 60), celos.notTrigger(celos.successTrigger("foo"))); ....

==== celos.alwaysTrigger

===== Description

A trigger that's always ready, to be used when a workflow has no preconditions and should simply run at any scheduled time.

===== Syntax

[source,javascript] .... celos.alwaysTrigger() ....

===== Examples

[source,javascript] .... celos.alwaysTrigger(); ....

=== Schedules Reference

A schedule determines the points in time (slots) at which a workflow should run.

==== celos.cronSchedule

===== Description

A cron-like schedule.

The full cron syntax is described here: http://www.quartz-scheduler.org/documentation/quartz-1.x/tutorials/crontrigger

===== Syntax

[source,javascript] .... celos.cronSchedule(cronExpr) ....

===== Parameters

[options="header"] |=== |Name|Type|Required|Description |cronExpr|String|Yes|The link:http://www.quartz-scheduler.org/documentation/quartz-1.x/tutorials/crontrigger[cron expression]. |===

===== Examples

[source,javascript] .... // Runs a workflow at 10:15am every day. celos.cronSchedule("0 15 10 ?"); ....

==== celos.hourlySchedule

===== Description

Runs a workflow every hour.

A shortcut for celos.cronSchedule("0 0 * * * ?").

===== Syntax

[source,javascript] .... celos.hourlySchedule() ....

===== Examples

[source,javascript] .... celos.hourlySchedule(); ....

==== celos.minutelySchedule

===== Description

Runs a workflow every minute.

A shortcut for celos.cronSchedule("0 * * * * ?").

===== Syntax

[source,javascript] .... celos.minutelySchedule() ....

===== Examples

[source,javascript] .... celos.minutelySchedule(); ....

==== celos.dependentSchedule

===== Description

Runs a workflow with the same schedule as another workflow.

===== Syntax

[source,javascript] .... celos.dependentSchedule(workflowID) ....

===== Parameters

[options="header"] |=== |Name|Type|Required|Description |workflowID|String|Yes|The workflow ID of the other workflow. |===

===== Examples

[source,javascript] .... // A workflow using this schedule will run with the same schedule as // the workflow with the ID "foo". celos.dependentSchedule("foo"); ....

=== Scheduling Strategies Reference

A scheduling strategy determines the order in which the ready slots of a workflow are executed.

==== celos.serialSchedulingStrategy

===== Description

Executes slots oldest first, with a configurable concurrency level.

===== Syntax

[source,javascript] .... celos.serialSchedulingStrategy(concurrency?) ....

===== Parameters

[options="header"] |=== |Name|Type|Required|Description |concurrency|Number|No|The number of slots to execute at the same time (defaults to 1). |===

===== Examples

[source,javascript] .... // A workflow using this scheduling strategy will have // at most three slots executing concurrently. celos.serialSchedulingStrategy(3); ....

=== External Services Reference

An external service actually executes a workflow.

==== celos.oozieExternalService

===== Description

Executes slots with Oozie.

===== Syntax

[source,javascript] .... celos.oozieExternalService(properties, oozieURL?) ....

===== Parameters

[options="header"] |=== |Name|Type|Required|Description |properties|Object|Yes|Properties to pass to Oozie. |oozieURL|String|No|The HTTP URL of the Oozie API. If not specified, the value of the <> variable will be used. |===

Inside property values, the variables ${year}, ${month}, ${day}, ${hour}, ${minute}, and ${second}, will be replaced by the zero-padded values from the slot's scheduled time.

year, month, day, hour, minute, and second will also be set as Oozie properties, so they can be used in the Oozie workflow XML file.

Additionally, Celos will set the Oozie property celosWorkflowName to a string like "my-workflow@2015-09-12T20:00Z", useful for display.

oozie.wf.application.path is the only property required by Oozie. It points to a link:https://oozie.apache.org/docs/4.2.0/WorkflowFunctionalSpec.html[Oozie workflow XML file] within an Oozie workflow directory. There can be multiple XML files within a single Oozie workflow directory.

If <> is defined and an Object, its members are added (before other properties, so they can be overridden) to the Oozie properties.

===== Examples

[source,javascript] .... celos.oozieExternalService({ "oozie.wf.application.path": "/workflow-dir/workflow.xml", "prop1": "Hello. It is the year ${year}!", "prop2": "Just another property." }); ....

=== Variables

If defined, these global variables influence some API functions. It's a good idea to set them in a common defaults file imported by all workflows.

==== CELOS_DEFAULT_HDFS

The String value of this variable will be used as the default HDFS name node URI by <>.

==== CELOS_DEFAULT_OOZIE

The String value of this variable will be used as the default Oozie API URL by <>.

==== CELOS_DEFAULT_OOZIE_PROPERTIES

The members of this Object will be added (before other properties, so they can be overridden) to the Oozie properties of a workflow by <>.

== Celos Server Reference

The celos-server.jar launches Celos.

The celos-server.jar must be run in the following way, due to the need to put the Hadoop configuration on the classpath:

[source,shell] .... java -cp celos-server.jar:/etc/hadoop/conf com.collective.celos.server.Main ....

=== Server Command-Line Arguments

[options="header"] |=== |Name|Type|Required|Description |--port|Integer|Yes|HTTP port for server. |--workflows|Path|No|Workflows directory (defaults to /etc/celos/workflows). |--defaults|Path|No|Defaults directory (defaults to /etc/celos/defaults). |--logs|Path|No|Logs directory (defaults to /var/log/celos). |--db|Path|No|State database directory (defaults to /var/lib/celos/db). |--autoSchedule|Integer|No|Interval (in seconds) between scheduler steps. If not supplied, Celos will not automatically step the scheduler, and wait for POSTs to the /scheduler servlet instead. |===

=== Server HTTP API

==== /scheduler

Doing a POST to this servlet initiates a scheduler step.

In production we do this once a minute from cron.

===== Example

[source,shell] .... curl -X POST localhost:1234/scheduler ....

==== /workflow-list

Doing a GET to this servlet returns the list of workflows loaded into Celos.

===== Example

[source,shell] .... curl "localhost:1234/workflow-list" ....

prints:

.... { "ids" : [ "workflow-1", "workflow-2", "workflow-3" ] } ....

==== /workflow-slots

Doing a GET to this servlet returns the slots of a workflow within a time range.

It also returns other information about the workflow, such as its paused state (see the <> servlet).

===== Parameters

[options="header"] |=== |Name|Type|Required|Description |id|String|Yes|ID of the workflow. |end|String (ISO 8601, UTC)|No|Time (exclusive) of most recent slot to return. Defaults to current time. |start|String (ISO 8601, UTC)|No|Time (inclusive) of earliest slot to return. Defaults to 1 week before end. |===

===== Example

[source,shell] .... curl "localhost:1234/workflow-slots?id=workflow-1" ....

prints:

.... { "paused": false, "slots" : [ { "time" : "2015-09-13T13:50:00.000Z", "status" : "READY", "externalID" : null, "retryCount" : 0 }, { "time" : "2015-09-13T13:45:00.000Z", "status" : "SUCCESS", "externalID" : "0004806-150911205802478-oozie-oozi-W", "retryCount" : 0 }, { "time" : "2015-09-13T13:40:00.000Z", "status" : "SUCCESS", "externalID" : "0004804-150911205802478-oozie-oozi-W", "retryCount" : 0 }, ... ] } ....

==== /trigger-status

Doing a GET to this servlet returns human-readable information about why a slot is waiting.

===== Parameters

[options="header"] |=== |Name|Type|Required|Description |id|String|Yes|ID of the workflow. |time|String (ISO 8601, UTC)|Yes|Scheduled time of slot to check. |===

===== Example

[source,shell] .... curl "localhost:1234/trigger-status?id=workflow-1&time=2015-09-13T13:00Z" ....

prints:

.... { "type" : "AndTrigger", "ready" : false, "description" : "Not all nested triggers are ready", "subStatuses" : [ { "type" : "DelayTrigger", "ready" : false, "description" : "Delayed until 2015-09-14T16:00:00.000Z", "subStatuses" : [ ] }, { "type" : "HDFSCheckTrigger", "ready" : true, "description" : "HDFS path hdfs://nameservice1/logs/dc3/2015-09-14/1500 is ready", "subStatuses" : [ ] } ] } ....

==== /rerun

Doing a POST to this servlet instructs Celos to mark a slot for rerun.

The slot's state will be reset to waiting and its retry count will be reset to 0.

===== Parameters

[options="header"] |=== |Name|Type|Required|Description |id|String|Yes|ID of the workflow. |time|String (ISO 8601, UTC)|Yes|Scheduled time of slot to rerun. |===

===== Example

[source,shell] .... curl -X POST "localhost:1234/rerun?id=workflow-1&time=2015-09-13T13:40Z" ....

==== /kill

Doing a POST to this servlet marks a slot as killed and also kills its underlying Oozie job, if any.

===== Parameters

[options="header"] |=== |Name|Type|Required|Description |id|String|Yes|ID of the workflow. |time|String (ISO 8601, UTC)|Yes|Scheduled time of slot to kill. |===

===== Example

[source,shell] .... curl -X POST "localhost:1234/kill?id=workflow-1&time=2015-09-13T13:40Z" ....

==== /pause

Doing a POST to this servlet pauses or unpauses a workflow. While a workflow is paused, the scheduler will simply ignore it. This means it doesn't check any triggers for the workflow, doesn't submit new jobs to the workflow's external service, and doesn't perform any other action related to the workflow.

You can check whether a workflow is paused by looking at the paused field of the result of the <> servlet.

===== Parameters

[options="header"] |=== |Name|Type|Required|Description |id|String|Yes|ID of the workflow. |paused|Boolean|Yes|Whether to pause (true) or unpause (false) the workflow. |===

===== Example

[source,shell] ....

Pause a workflow

curl -X POST "localhost:1234/pause?id=workflow-1&paused=true"

Unpause a workflow

curl -X POST "localhost:1234/pause?id=workflow-1&paused=false" ....

== Celos CI Reference

The celos-ci-fat.jar can be used to deploy workflow and defaults JavaScript files, as well as Oozie workflow directories automatically.

[source,shell] .... java -jar celos-ci-fat.jar ....

=== CI Command-Line Arguments

[options="header"] |=== |Name|Type|Required|Description |--mode|String|Yes|deploy or undeploy |--workflowName|String|Yes|Name of workflow (or rather, project). |--deployDir|Path|Yes|The deployment directory (not needed for undeploy). |--target|URL|Yes|The target file (file: or sftp: URL). |--hdfsRoot|Path|No|HDFS data will be placed under this root (defaults to /user/celos/app). |===

=== Deployment Directory

A deployment directory must follow a canonical directory layout:

.... workflow.js defaults.js hdfs/ workflow.xml ... lib/ ... ....

If WORKFLOW_NAME is the value of the --workflowName argument, and HDFS_ROOT is the value of the --hdfsRoot argument, and WORKFLOWS_DIR and DEFAULTS_DIR are the Celos workflows and defaults directories specified in the target file, respectively, then the files will be deployed to the following locations:

.... workflow.js -> $WORKFLOWS_DIR/$WORKFLOW_NAME.js defaults.js -> $DEFAULTS_DIR/$WORKFLOW_NAME.js hdfs/ -> $HDFS_ROOT/$WORKFLOW_NAME ....

=== Target File

A target file is a JSON file that describes a Celos and HDFS setup.

[options="header"] |=== |Name|Type|Required|Description |hadoop.hdfs-site.xml|URL|Yes|URL of Hadoop hdfs-site.xml File |hadoop.core-site.xml|URL|Yes|URL of Hadoop core-site.xml File |defaults.dir.uri|URL|Yes|URL of Celos defaults directory. |workflows.dir.uri|URL|Yes|URL of Celos workflows directory. |===

All fields must be file: or sftp: URLs.

Example:

[source,json] .... { "hadoop.hdfs-site.xml": "sftp://celos002.ewr004.collective-media.net/etc/hadoop/conf/hdfs-site.xml", "hadoop.core-site.xml": "sftp://celos002.ewr004.collective-media.net/etc/hadoop/conf/core-site.xml", "defaults.dir.uri": "sftp://celos002.ewr004.collective-media.net/etc/celos/defaults", "workflows.dir.uri": "sftp://celos002.ewr004.collective-media.net/etc/celos/workflows", } ....

The best practice for using Celos CI is putting a target file for each Celos installation (e.g. production and staging) in a central, SFTP-accessible location, and storing the target file's SFTP URL in an environment variable (e.g. PRODUCTION_TARGET and STAGING_TARGET). Deploy scripts using Celos CI should then pass this variable as the --target argument to Celos CI, making them independent of the Celos installation to which the workflow is to be deployed. See link:samples/quickstart[] for an example.

=== CI Environment Variables

==== CELOS_CI_USERNAME

If defined, overrides the username in sftp: URLs in the target file.

== Celos UI Reference

The celos-ui.jar runs the Celos user interface.

[source,shell] .... java -jar celos-ui.jar ....

=== UI Command-Line Arguments

[options="header"] |=== |Name|Type|Required|Description |--port|Integer|Yes|HTTP port for UI. |--celos|URL|Yes|Celos URL. |--hue|URL|No|Hue (Oozie UI) URL. If specified, slots become linked to Hue. |--config|Path|No|JSON <>. |===

=== UI Config File

If the --config argument is not supplied to the UI, it simply renders one long list of all loaded workflows.

By supplying a JSON file with the following format to --config, the workflows can be hierarchically grouped (one level):

[source,javascript] .... { "groups": [ { "name": "Group 1", "workflows": [ "workflow-1", "workflow-2" ] }, { "name": "Group 2", "workflows": [ "workflow-3" ] } ] } ....

=== UI HTTP API

==== /ui

Doing a GET to this servlet displays the Celos UI.

===== Parameters

[options="header"] |=== |Name|Type|Required|Description |time|String (ISO 8601, UTC)|No|Time of most recent slot to display (defaults to current time). |zoom|String (Number)|No|Zoom level in minutes (defaults to 60). |===

== Miscellaneous

=== Related Tools

Two similar, programmable schedulers:

=== The Celos Name

The link:http://www.quicksilver899.com/Tolkien/LOTR/LOTR_AC.html[_Lord of the Rings Dictionary_] defines it as:

.... Celos S; also Kelos; freshet; kel- flow away [Sil; *kelu-]; one would want to choose los snow [Sil] for the final element, but the text of Unfinished Tales, Index, entry Celos states the final form derives from Q -sse, -ssa, a form of emphasis [some say locative], making the definition 'much flowing' or 'freshet', often resulting from melting snow; perhaps 'snow' is then implied from the ending; a river in Gondor ....

Alternatively, the link:http://programmingisterrible.com/post/65781074112/devils-dictionary-of-programming[_Devil’s Dictionary of Programming_] defines it as:

.... Configurable: It’s your job to make it usable. Elegant: The only use case is making me feel smart. Lightweight: I don’t understand the use-cases the alternatives solve. Opinionated: I don’t believe that your use case exists. Simple: It solves my use case. ....

=== Acknowledgements

Thanks to our in-house users and to the developers of the many fine open source libraries we're able to use, including but not limited to Oozie, Hadoop, Jetty, Rhino, Joda, Jackson, Quartz, and Gradle.