Closed dustinblack closed 6 months ago
I switched it to use the namespace, and it just worked. I removed all of the now-unnecessary objects from the workflow. Did you need any of those objects redefined for default overrides? The workflow still has the ones that have comments saying that they were retained for default values.
version: v0.2.0
input:
root: StressngWorkflowInput
objects:
PcpInputParams:
# Not using namespaced scope because the workflow schema is customized
id: PcpInputParams
properties:
flatten:
display:
description: Processes the metrics first into a two-dimensional format
via the pcp2csv converter, and then converts the CSV to JSON, effectively
flattening the data structure. This is useful when indexing metrics
to a service like Elasticsearch.
name: flatten JSON structure
required: false
type:
type_id: bool
default: true
pmlogger_interval:
display:
description: The logger collection interval for PCP pmlogger
name: PCP pmlogger collection interval
type:
type_id: float
required: false
default: 0.5
pmlogger_metrics:
display:
description: The pmrep-compatible metrics values to report as a space-separated string.
name: pmlogger metrics to report
type:
type_id: string
required: false
default: kernel.uname, hinv.ncpu, mem.physmem, disk.dev.scheduler, kernel.cpu.util.user, kernel.cpu.util.nice, kernel.cpu.util.sys, kernel.cpu.util.wait, kernel.cpu.util.steal, kernel.cpu.util.idle, kernel.percpu.cpu.vuser, kernel.percpu.cpu.nice, kernel.percpu.cpu.sys, kernel.percpu.cpu.wait, kernel.percpu.cpu.steal, kernel.percpu.cpu.idle, disk.all.total, disk.all.read, disk.all.write, disk.all.blkread, disk.all.blkwrite, mem.freemem, mem.util.available, mem.util.used, mem.util.bufmem, mem.util.cached, mem.util.active, mem.util.inactive, mem.util.dirty, swap.in, swap.pagesin, swap.out, swap.pagesout, network.interface.in.packets, network.interface.out.packets, network.interface.in.bytes, network.interface.out.bytes
StressngWorkflowInput:
id: StressngWorkflowInput
properties:
pcp_params:
# Not using namespaced scope because the workflow schema has defaults defined
display:
description: The parameters for the PCP workload
name: PCP parameters
type:
type_id: ref
id: PcpInputParams
stressng_params:
display:
description: The parameters for the stressng workload
name: stressng parameters
type:
type_id: ref
id: StressNGParams
namespace: $.steps.stressng.starting.inputs.input
required: true
steps:
uuidgen:
plugin:
deployment_type: image
src: quay.io/arcalot/arcaflow-plugin-utilities:0.5.1
step: uuid
input: {}
pcp:
plugin:
deployment_type: image
src: quay.io/arcalot/arcaflow-plugin-pcp:0.9.0
step: start-pcp
deploy:
deployer_name: podman
deployment:
host:
NetworkMode: host
Binds:
- /etc/system-release:/etc/system-release
input: !expr $.input.pcp_params
stop_if: !expr $.steps.post_wait.outputs.success
pre_wait:
plugin:
deployment_type: image
src: quay.io/arcalot/arcaflow-plugin-test-impl-go:0.4.1
step: wait
input:
wait_time_ms: 10000
wait_for: !expr $.steps.pcp.starting.started
stressng:
plugin:
deployment_type: image
src: quay.io/arcalot/arcaflow-plugin-stressng:simplify-schema_ebe8b96
step: workload
input: !expr $.input.stressng_params
wait_for: !expr $.steps.pre_wait.outputs.success
post_wait:
plugin:
deployment_type: image
src: quay.io/arcalot/arcaflow-plugin-test-impl-go:0.4.1
step: wait
input:
wait_time_ms: 10000
wait_for: !expr $.steps.stressng.outputs.success
outputs:
success:
sample_uuid: !expr $.steps.uuidgen.outputs.success.uuid
test_results: !expr $.steps.stressng.outputs.success
pcp_time_series: !expr $.steps.pcp.outputs.success.pcp_output
I don't need any parameters re-defined for stressing, no. So this begs the question of what's different between your environment and mine. What engine version did you achieve this with? And did you run on MacOS?
It looks like I over-simplified the reproducer. When I tested again with just the stressng-workflow.yaml
directly, it worked correctly for me, as well. However, when running this using the parent workflow.yaml
, I encounter the error reported when the stressng-workflow.yaml
is configured to use the namespace from the plugin.
Testing with the parent workflow, you will probably want to disable the horreum-related step and output, and you may want to adjust the sample input file for a single stress-ng test and short timeout value.
Describe the bug
When trying to use a namespaced scope with the stress-ng plugin (development branch, currently), the workflow will fail. The panic seems to randomly reference one of the sub-classes of the
stressors
one-of union in the input schema.To reproduce
The workflow here can be used to reproduce the problem: https://gitlab.com/redhat/edge/tests/perfscale/arcaflow-workflow-sysbench/-/blob/nightly-tests-stressng/stressng-workflow.yaml?ref_type=heads
Test with just the sub-workflow, un-commenting the
namespace
parameter for$.inputs.StressngWorkflowInput.stressng_params
:Additional context
The stress-ng plugin includes a one-of union of other classes with discriminators.
https://github.com/arcalot/arcaflow-plugin-stressng/blob/ebe8b964b3c8e3d37228585b09b27af0584f9778/arcaflow_plugin_stressng/stressng_schema.py#L551-L603
I'm not positive this is the source of the problem, but it seems suspect.