arcalot / arcaflow-engine

Arcaflow is a highly-portable workflow engine enabling modular and validated pipelines through containerized plugins.
https://arcalot.io/arcaflow/
Apache License 2.0
6 stars 9 forks source link

Namespaced scope fails for stress-ng plugin #179

Closed dustinblack closed 4 months ago

dustinblack commented 5 months ago

Describe the bug

When trying to use a namespaced scope with the stress-ng plugin (development branch, currently), the workflow will fail. The panic seems to randomly reference one of the sub-classes of the stressors one-of union in the input schema.

panic: Referenced object 'CpuStressorParams' not found in scope with namespace ""; available:
StressNGParams

goroutine 1 [running]:
go.flow.arcalot.io/pluginsdk/schema.(*RefSchema).ApplyScope(0x40005660c0, {0x182b030, 0x40002f4090}, {0x0, 0x0})
    /home/runner/go/pkg/mod/go.flow.arcalot.io/pluginsdk@v0.11.1/schema/ref.go:121 +0x214
go.flow.arcalot.io/pluginsdk/schema.OneOfSchema[...].ApplyScope(0x18379c0, {0x182b030, 0x40002f4090}, {0x0, 0x0})
    /home/runner/go/pkg/mod/go.flow.arcalot.io/pluginsdk@v0.11.1/schema/oneof.go:48 +0x98
go.flow.arcalot.io/pluginsdk/schema.AbstractListSchema[...].ApplyScope(...)
    /home/runner/go/pkg/mod/go.flow.arcalot.io/pluginsdk@v0.11.1/schema/list.go:85
go.flow.arcalot.io/pluginsdk/schema.(*PropertySchema).ApplyScope(...)
    /home/runner/go/pkg/mod/go.flow.arcalot.io/pluginsdk@v0.11.1/schema/property.go:126
go.flow.arcalot.io/pluginsdk/schema.(*ObjectSchema).ApplyScope(...)
    /home/runner/go/pkg/mod/go.flow.arcalot.io/pluginsdk@v0.11.1/schema/object.go:62
go.flow.arcalot.io/pluginsdk/schema.NewScopeSchema(0x40007eeb40, {0x0, 0x0, 0x2a?})
    /home/runner/go/pkg/mod/go.flow.arcalot.io/pluginsdk@v0.11.1/schema/scope.go:36 +0x234
go.flow.arcalot.io/engine/workflow.addScopesWithReferences(0x4000655200?, {0x182b030?, 0x40007a5aa0}, {0x4000382bd0, 0x2a})
    /home/runner/go/pkg/mod/go.flow.arcalot.io/engine@v0.16.0-beta1/workflow/executor.go:394 +0x18c
go.flow.arcalot.io/engine/workflow.addInputNamespacedScopes(0x40003f42d0?, {{{0x1538390, 0x7}, {0x1548700, 0x15}, {0x153a103, 0x9}, {0x15390f7, 0x8}, 0x40003417d0, ...}, ...}, ...)
    /home/runner/go/pkg/mod/go.flow.arcalot.io/engine@v0.16.0-beta1/workflow/executor.go:376 +0x1a0
go.flow.arcalot.io/engine/workflow.applyLifecycleScopes(0x4000567b40?, {0x182b030, 0x400000c930})
    /home/runner/go/pkg/mod/go.flow.arcalot.io/engine@v0.16.0-beta1/workflow/executor.go:326 +0x1f4
go.flow.arcalot.io/engine/workflow.(*executor).Prepare(0x4000567b40, 0x40000ff8b0, 0x40005afe30)
    /home/runner/go/pkg/mod/go.flow.arcalot.io/engine@v0.16.0-beta1/workflow/executor.go:124 +0x1e4
go.flow.arcalot.io/engine.workflowEngine.Parse({{0x1821ea0, 0x40004d70b0}, {0x1820690, 0x400058ff10}, 0x4000110660}, {0x1823c50, 0x40007a4558}, {0x1538cc7?, 0x400054daf8?})
    /home/runner/go/pkg/mod/go.flow.arcalot.io/engine@v0.16.0-beta1/engine.go:121 +0x28c
main.runWorkflow({0x180c098, 0x40004d78c0}, {0x1823c50, 0x40007a4558}, {0x1538cc7, 0x8}, {0x1821ea0?, 0x40004d7020}, {0x4000048e00, 0x6f8, ...})
    /home/runner/go/pkg/mod/go.flow.arcalot.io/engine@v0.16.0-beta1/cmd/arcaflow/main.go:204 +0x18c
main.main()
    /home/runner/go/pkg/mod/go.flow.arcalot.io/engine@v0.16.0-beta1/cmd/arcaflow/main.go:190 +0xb48

To reproduce

The workflow here can be used to reproduce the problem: https://gitlab.com/redhat/edge/tests/perfscale/arcaflow-workflow-sysbench/-/blob/nightly-tests-stressng/stressng-workflow.yaml?ref_type=heads

Test with just the sub-workflow, un-commenting the namespace parameter for $.inputs.StressngWorkflowInput.stressng_params:

arcaflow -config config.yaml -input sample-input-stressng-cpu.yaml -context . -workflow stressng-workflow.yaml

Additional context

The stress-ng plugin includes a one-of union of other classes with discriminators.

https://github.com/arcalot/arcaflow-plugin-stressng/blob/ebe8b964b3c8e3d37228585b09b27af0584f9778/arcaflow_plugin_stressng/stressng_schema.py#L551-L603

I'm not positive this is the source of the problem, but it seems suspect.

jaredoconnell commented 4 months ago

I switched it to use the namespace, and it just worked. I removed all of the now-unnecessary objects from the workflow. Did you need any of those objects redefined for default overrides? The workflow still has the ones that have comments saying that they were retained for default values.

version: v0.2.0
input:
  root: StressngWorkflowInput
  objects:
    PcpInputParams:
      # Not using namespaced scope because the workflow schema is customized
      id: PcpInputParams
      properties:
        flatten:
          display:
            description: Processes the metrics first into a two-dimensional format
              via the pcp2csv converter, and then converts the CSV to JSON, effectively
              flattening the data structure. This is useful when indexing metrics
              to a service like Elasticsearch.
            name: flatten JSON structure
          required: false
          type:
            type_id: bool
          default: true
        pmlogger_interval:
          display:
            description: The logger collection interval for PCP pmlogger
            name: PCP pmlogger collection interval
          type:
            type_id: float
          required: false
          default: 0.5
        pmlogger_metrics:
          display:
            description: The pmrep-compatible metrics values to report as a space-separated string.
            name: pmlogger metrics to report
          type:
            type_id: string
          required: false
          default: kernel.uname, hinv.ncpu, mem.physmem, disk.dev.scheduler, kernel.cpu.util.user, kernel.cpu.util.nice, kernel.cpu.util.sys, kernel.cpu.util.wait, kernel.cpu.util.steal, kernel.cpu.util.idle, kernel.percpu.cpu.vuser, kernel.percpu.cpu.nice, kernel.percpu.cpu.sys, kernel.percpu.cpu.wait, kernel.percpu.cpu.steal, kernel.percpu.cpu.idle, disk.all.total, disk.all.read, disk.all.write, disk.all.blkread, disk.all.blkwrite, mem.freemem, mem.util.available, mem.util.used, mem.util.bufmem, mem.util.cached, mem.util.active, mem.util.inactive, mem.util.dirty, swap.in, swap.pagesin, swap.out, swap.pagesout, network.interface.in.packets, network.interface.out.packets, network.interface.in.bytes, network.interface.out.bytes
    StressngWorkflowInput:
      id: StressngWorkflowInput
      properties:
        pcp_params:
          # Not using namespaced scope because the workflow schema has defaults defined
          display:
            description: The parameters for the PCP workload
            name: PCP parameters
          type:
            type_id: ref
            id: PcpInputParams
        stressng_params:
          display:
            description: The parameters for the stressng workload
            name: stressng parameters
          type:
            type_id: ref
            id: StressNGParams
            namespace: $.steps.stressng.starting.inputs.input
          required: true

steps:
  uuidgen:
    plugin:
      deployment_type: image
      src: quay.io/arcalot/arcaflow-plugin-utilities:0.5.1
    step: uuid
    input: {}
  pcp:
    plugin:
      deployment_type: image
      src: quay.io/arcalot/arcaflow-plugin-pcp:0.9.0
    step: start-pcp
    deploy:
      deployer_name: podman
      deployment:
        host:
          NetworkMode: host
          Binds:
            - /etc/system-release:/etc/system-release
    input: !expr $.input.pcp_params
    stop_if: !expr $.steps.post_wait.outputs.success
  pre_wait:
    plugin:
      deployment_type: image
      src: quay.io/arcalot/arcaflow-plugin-test-impl-go:0.4.1
    step: wait
    input:
      wait_time_ms: 10000
    wait_for: !expr $.steps.pcp.starting.started
  stressng:
    plugin:
      deployment_type: image
      src: quay.io/arcalot/arcaflow-plugin-stressng:simplify-schema_ebe8b96
    step: workload
    input: !expr $.input.stressng_params
    wait_for: !expr $.steps.pre_wait.outputs.success
  post_wait:
    plugin:
      deployment_type: image
      src: quay.io/arcalot/arcaflow-plugin-test-impl-go:0.4.1
    step: wait
    input:
      wait_time_ms: 10000
    wait_for: !expr $.steps.stressng.outputs.success

outputs:
  success:
    sample_uuid: !expr $.steps.uuidgen.outputs.success.uuid
    test_results: !expr $.steps.stressng.outputs.success
    pcp_time_series: !expr $.steps.pcp.outputs.success.pcp_output
dustinblack commented 4 months ago

I don't need any parameters re-defined for stressing, no. So this begs the question of what's different between your environment and mine. What engine version did you achieve this with? And did you run on MacOS?

dustinblack commented 4 months ago

It looks like I over-simplified the reproducer. When I tested again with just the stressng-workflow.yaml directly, it worked correctly for me, as well. However, when running this using the parent workflow.yaml, I encounter the error reported when the stressng-workflow.yaml is configured to use the namespace from the plugin.

Testing with the parent workflow, you will probably want to disable the horreum-related step and output, and you may want to adjust the sample input file for a single stress-ng test and short timeout value.