OpenFn / kit

The bits & pieces that make OpenFn work. (diagrammer, cli, compiler, runtime, runtime manager, logger, etc.)
8 stars 12 forks source link

CLI: Allow steps/jobs in different workflows to have the same name #667

Closed christad92 closed 1 month ago

christad92 commented 2 months ago

The current implementation of openfn pull returns an error when a job name is duplicated in two different workflows. The duplicate job name constraint should only apply on workflows i.e. No two jobs within a workflow should have the same name.

See error message below: Image

Repro steps

You'll need to have the

Background

Using Lightning, users can create multiple Workflows for a given project. These workflows are used for data integration and workflow automation.

You can use our CLI tooling to download a local copy of a project. This creates a project.yaml file on your system with all the workflows contained within.

Implementation Notes

The pull command is handled by packages/cli/src/pull/handler.ts, but most of the heavy lifting is in packages/deploy.

I suspect the problem comes from code which validates the downloaded project.yaml file, working on the misconception that steps must be uniquely named across the project. In fact, they must only be uniquely named within the workflow. So this validation needs to be improved.

You should create unit tests in the deploy package which prove out your fix. See the existing tests for patterns you can use.

SatyamMattoo commented 2 months ago

The problem arises within

packages/deploy

Upon conducting an analysis of the validator function, I identified that the current implementation, wherein a single array of strings is employed to store workflows and job/steps, leads to a duplicate key error. This occurs even when the job names are identical across different workflows.

let keys: string[] = [];

  function pushUniqueKey(context: YAML.Pair, key: string) {
    if (keys.includes(key)) {
      errors.push({
        context,
        message: `duplicate key: ${key}`,
      });
    } else {
      keys.push(key);
    }
  }

I have devised a solution involving the utilization of separate sets for workflows and jobs. By employing this approach, we can effectively detect and handle duplicate keys within each set. I have already begun implementing these changes.

let workflowKeys: Set<string> = new Set();
  let jobKeys: Set<string> = new Set();

  function pushUniqueKey(context: YAML.Pair, key: string, keyType: 'workflow' | 'job') {
    const keyList = keyType === 'workflow' ? workflowKeys : jobKeys;
    if (keyList.has(key)) {
      errors.push({
        context,
        message: `duplicate ${keyType} key: ${key}`,
      });
    } else {
      keyList.add(key);
    }
  }

The proposed modifications, which involve using distinct sets for workflow and job keys, along with corresponding adjustments in the validate functions for both workflows and jobs, appear to be a promising course of action to resolve the encountered error. Thank you for your consideration.

Best regards