OpenFn / kit

The bits & pieces that make OpenFn work. (diagrammer, cli, compiler, runtime, runtime manager, logger, etc.)
10 stars 9 forks source link

CLI: support execute from workflow.yaml #621

Open josephjclark opened 6 months ago

josephjclark commented 6 months ago

Overview

The OpenFn CLI currently supports running jobs through a job/expression file (ie, openfn job.js) or a workflow file (ie, openfn workflow.json)

Users of Lightning (the hosted platform at openfn.org) can sync their projects (including multiple workflows) with github. This in effect allows them to write and test their workflows locally, commit them to a github repo, and deploy that repo to Lightning. See the docs at: https://docs.openfn.org/documentation/link-to-Github

The project is saved locally in a file called project.yaml. Unfortunately, users cannot yet execute a workflow inside a project.yaml using the CLI.

Suggested Fix

It should be fairly straightforward for the CLI to import a project.yaml file, convert it into a standard Workflow object, and then execute it as usual.

Since a project.yaml file can contain multiple workflows, the CLI should support a --workflow (shorthanded to -w) argument which specifies the workflow by id. If the project only contains one workflow, we can just go ahead and use that without the -w argument.

Implementation Notes

Within the CLI and the kit repo, workflows are often known as Execution Plans. Technically an Execution Plan is a wrapper around a workflow which can include options (and maybe later metadata). Tthe terms workflow/excution plan/xplan and plan are used pretty much interchangably - this was to help divorce the platform's Workflow concept from the CLI's equivalent, which used to be a little different.

This fix should be implemented in packages/cli, integrated into the existing execute command.

There is a file in src/utils/load-plan.ts which is designed to load a workflow from a .js or .json file. You should extend the main loadPlan() to support a projectPath option (see src/options.ts::export const inputPath)

You should use the yaml npm package to handle yaml file parsing (pnpm add yaml)

See the lexicon package for type definitions for ExecutionPlan, which is the workflow structure you need to generate from the yaml.

You can see some project.yaml examples in this issue: https://github.com/OpenFn/kit/issues/619

Any properties in project.yaml which are not reflected in workflow.json can safely be ignored.

Testing

Unit tests should be created to demonstrate that project.yaml. Adding good coverage to test/util/load-plan.test.ts should suffice.

A test should also be added to test/commands.test.ts which loads a yaml flie. These tests use a mock file system - see the other tests for a good reference pattern.

I would also love to see an integration test added to /integration-tests/cli/execute-workflow.ts. Create a project.yaml fixture, load it into your test, and assert on the result.

Other Notes

Other issues may well arise when trying to execute a project.yaml file. I think it's a fairly straightforward mapping but some unforeseen incompatibility could well crop up. In this event, please raise issues here on github (and maybe tag @josephjclark for help).

We welcome feedback on the docs inside this repo! Did the readme help you get started? Are you able to find the code you needed? Is there anything that would have made your work easier?

josephjclark commented 5 months ago

Hey @utnim2, thanks for speaking up! Are you coming from the Code for Govtech DMP programme?

josephjclark commented 5 months ago

Hey @utnim2, sorry for the late reply! I'm at a conference and things have been a little hectic.

This issue isn't marked for Code for Govtech. It's just a very small issue really.

My colleague @christad92 is running that stuff. He's also at this conference today but he'll be in touch with you soon to help find the right match!

josephjclark commented 5 months ago

Hi @utnim2. Ok, that sounds like a good idea! But to be absolutely clear - this wouldn't contribute to or influence your DMP project.

This issue is not very well specced. I'll add some more details to you in a few hours and tag you here.

It shouldn't be a big project. If the input file is a yaml file, convert it to a JSON format. I'm not actually sure how complex the conversion will be, but I'll link you to the type definitions/specifications you'll need.

josephjclark commented 5 months ago

Hi @utnim2, I've added more details to the issue and assigned you. May the force be with you :vulcan_salute:

josephjclark commented 5 months ago

I've unassigned @utnim2 from this issue, so it is now open to contributors.

Akshanshkaushal commented 4 months ago

Hi @josephjclark To better understand the codebase, I want to solve this issue. Could you please assign me this?

josephjclark commented 4 months ago

Hi @Akshanshkaushal - thanks for the interest!

It would be a good idea for you to reply here with a rough solution, just enough to suggest you understand the problem and how best to solve it. In particular I'm interested in what manual and automation tests you'll use and how you'll prove your final solution works.

Akshanshkaushal commented 4 months ago

All right, @josephjclark. It will be carried out in the manners listed below:

1>User Input: The user specifies the path to the project.yaml file and, optionally, the ID of the specific workflow they want to execute.

2>Load YAML: The CLI loads the YAML file specified by the user.

3>Parse YAML: The YAML content is parsed into an object representation.(JSON), where users can update the workflows.

4>Select Workflow: If a workflow ID is provided, the CLI selects the specific workflow from the parsed YAML content; otherwise, it defaults to the first workflow in the file.

5>Convert to Execution Plan: The selected workflow is converted into an Execution Plan (workflow name, steps, jobs, etc)

6>Execute Workflow: The CLI executes the workflow according to the generated Execution Plan.

josephjclark commented 4 months ago

@Akshanshkaushal that's more-or-less the right flow but I was hoping for a bit more technical detail.

Where in the existing codebase will you make the conversion? How will you convert the yaml to json? Do you understand the existing YAML or JSON structures we use? How will you test this? This last is the most important question: we cannot merge untested code.

Also to be clear: if the project.yaml has multiple workflows, we HAVE to take an argument from the user, or we take an error. We do not default to running the first workflow, that's unintuitive

Akshanshkaushal commented 4 months ago

Yes, @josephjclark, kindly refer to this technical approach. 1>We are going to update src/utils/load-plan.ts's loadPlan() function to parse YAML into JSON format, enabling users to update workflows. 2>Second, modify the execute command in packages/cli so that it can parse the -w or --workflow argument and send it to the loadPlan() function. If there are multiple workflows, the workflow must be specified; otherwise, an error will occur. If there is only one workflow, the workflow will be selected automatically. 3>implemting units test for loadPlan function to parse yaml files and execute command to parse --workflow or -w argument.

josephjclark commented 4 months ago

Alright @Akshanshkaushal , sounds good! Feel free to take a swing and open a PR :)