argoproj-labs / hera

Hera makes Python code easy to orchestrate on Argo Workflows through native Python integrations. It lets you construct and submit your Workflows entirely in Python. ⭐️ Remember to star!
https://hera.rtfd.io
Apache License 2.0
560 stars 105 forks source link

Locally-runnable DAG breaks down for complex examples #1172

Open elliotgunton opened 3 weeks ago

elliotgunton commented 3 weeks ago

Pre-bug-report checklist

1. This bug can be reproduced using pure Argo YAML

If yes, it is more likely to be an Argo bug unrelated to Hera. Please double check before submitting an issue to Hera.

2. This bug occurs in Hera when...

Bug report

Describe the bug A clear and concise description of what the bug is:

When running locally, parameters/artifacts won't be loaded via Pydantic deserialization, meaning I cannot run locally for non-POD parameter types.

Error log if applicable:

pydantic_core._pydantic_core.ValidationError: 2 validation errors for FeatureScalingInput
X_train
  Input should be a valid dictionary [type=dict_type, input_value='{...}', input_type=str]

To Reproduce Full Hera code to reproduce the bug: Previous task output:

class DatasetsOutput(Output):
    X_train: Annotated[str, Artifact(name="X_train", archive=NoneArchiveStrategy())]
    X_test: Annotated[str, Artifact(name="X_test", archive=NoneArchiveStrategy())]
    y_train: Annotated[str, Artifact(name="y_train", archive=NoneArchiveStrategy())]
    y_test: Annotated[str, Artifact(name="y_test", archive=NoneArchiveStrategy())]

Task input:

class FeatureScalingInput(Input):
    X_train: Annotated[dict, Artifact(name="X_train", loader=ArtifactLoader.json)]
    X_test: Annotated[dict, Artifact(name="X_test", loader=ArtifactLoader.json)]

Dag:

@w.dag()
def run_training(_: Input):
    datasets = load_and_split_dataset(LoadDatasetInput())
    scaling = feature_scaling(
        FeatureScalingInput(X_train=datasets.X_train, X_test=datasets.X_test)
    )

Expected behavior A clear and concise description of what you expected to happen:

To be able to run locally. Example works on-cluster

Environment

Additional context Add any other context about the problem here.