OpenFn / kit

The bits & pieces that make OpenFn work. (diagrammer, cli, compiler, runtime, runtime manager, logger, etc.)
8 stars 12 forks source link

CLI: add better validation on workflows #681

Closed josephjclark closed 4 weeks ago

josephjclark commented 2 months ago

When loading a workflow JSON, the CLI should do more validation work, warning users when a hand-made workflow is incorrect.

The runtime actually has a validate-plan.ts file - it should be acceptable to just add extra validation rules here. They should bubble up to the CLI (for example, a workflow.json which specifies a start step not defined in the workflow will throw a validation error, ie { options: { start: "x" }, workflow: { steps: [ { id: "a" }] } }

Validation rules should include:

Plus anything else that may be useful

All validation rules should be thoroughly unit tested.

A big risk is that adding extra validation will break unit tests, which often delight in using invalid workflows. I think there's already a skipValidation option which we can pass to the runtime and we may need to add this. If this is breaking lots and lots and lots of tests we may want to re-think (perhaps validation should be opt-in, and we need to change the CLI to always pass a validate: true flag)

SatyamMattoo commented 1 month ago

Hey @josephjclark , While working on the issue, I came across a test in runtime.test.ts that includes data in the workflow steps. I'm a bit confused about this, as the types of Step | Job | Trigger do not include data. Typically, the data should be included inside the state.

Screenshot 2024-05-19 160641

To address this, I've considered adding more validation functions to thevalidate-plan.tsfile. My approach includes:

  1. Validation for the workflow

    const assertWorkflowStructure = (plan: ExecutionPlan) => {
    const { workflow, options } = plan;
    
    if (!workflow || typeof workflow !== 'object') {
    throw new ValidationError('Missing or invalid workflow key in execution plan.');
    }
    
    if (!Array.isArray(workflow.steps)) {
    throw new ValidationError('The workflow.steps key must be an array.');
    }
    
    if (workflow.steps.length === 0) {
    console.warn('Warning: The workflow.steps array is empty.');
    }
    
    workflow.steps.forEach((step, index) => {
    assertStepStructure(step, index);
    });
    
    if (options) {
    assertOptionsStructure(options);
    }
    }; 
  2. Validation for the steps and the options:

const assertStepStructure = (step: Step | Job | Trigger, index: number) => {
  const allowedKeys = ['id', 'name', 'next', 'previous', 'adaptor', 'expression', 'state', 'configuration', 'linker'];

  Object.keys(step).forEach((key) => {
    if (!allowedKeys.includes(key)) {
      throw new ValidationError(`Invalid key "${key}" in step ${step.id || index}.`);
    }
  });
};

const assertOptionsStructure = (options: WorkflowOptions) => {
  const allowedKeys = ['timeout', 'stepTimeout', 'start', 'end', 'sanitize'];

  Object.keys(options).forEach((key) => {
    if (!allowedKeys.includes(key)) {
      console.warn(`Warning: Unrecognized option "${key}" in options object.`);
    }
  });
};

Also, I had a doubt about this statement "A step with an adaptor should also have an expression (steps without expressions are fine)". This basically means if the step has an adaptor, it should have an expression as well, but if an adaptor is missing there is no need for an expression. Have I interpreted it correctly?

This approach does not break any tests for the runtime except for the one mentioned. Could you please review this and let me know if this aligns with what you had in mind? Additionally, do you have any suggestions or improvements for this approach?

josephjclark commented 1 month ago

Hi @SatyamMattoo ,

That data property looks like an old thing that got missed when refactoring code. Please feel free to update the test!

See these validations implemented in code is making me a little anxious. I'd like to think about this a bit more after I've had my coffee. I'm worried we're being too strict - maybe this should go into the CLI, not the core runtime. I'm not sure - let me think it over.

if the step has an adaptor, it should have an expression as well, but if an adaptor is missing there is no need for an expression.

This is correct!

SatyamMattoo commented 1 month ago

Thank you @josephjclark, I'll update the test accordingly.

Regarding the validations in the code, I completely understand your concern. If you have any further thoughts or suggestions, I will be more than happy to help.

Pin4sf commented 1 month ago

@josephjclark I am also looking into the issue and I think the approach provided by @SatyamMattoo of Custom Validation Functions is correct way to go as here we are validating the structure of the workflow, steps, and options in the execution plan.

I'm worried we're being too strict

Your concern is also valid so I think we can iterate over other approach which can be :-

josephjclark commented 1 month ago

@SatyamMattoo can you open a PR with your work so far please?

@Pin4sf We heavily use JSON schema across our stack - if we were to introduce a schema here, that would be the way to go. But we already have a formal type declaration in TypeScript. I'd rather use that than duplicate it,

Dharssini commented 1 month ago

@josephjclark I am currently working on this issue under C4GT, will raise a PR shortly

josephjclark commented 1 month ago

@Dharssini This is not a C4GT DMP issue. If you want to apply for one of those you have to apply to the portal directly.

This issue is being worked on by @SatyamMattoo and does not require more input. Thanks!