flyteorg / flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
https://flyte.org
Apache License 2.0
5.17k stars 550 forks source link

[BUG] Error in workflow compilation logic for special variable names #5511

Closed ddl-rliu closed 14 hours ago

ddl-rliu commented 3 days ago

Describe the bug

Compile-time validation is inconsistent between workflow run and workflow re-run. Minimal repro:

Special attention on the lines: inputs=kwtypes(**{"my.value": float, "c": float}) and calculate_ellipse_area_shell(**{"my.value": a, "c": b})

import logging

from flytekit import ContainerTask, kwtypes, task, workflow

logger = logging.getLogger(__file__)

calculate_ellipse_area_shell = ContainerTask(
    name="ellipse-area-metadata-shell",
    input_data_dir="/var/inputs",
    output_data_dir="/var/outputs",
    inputs=kwtypes(**{"my.value": float, "c": float}),
    outputs=kwtypes(area=float, metadata=str),
    image="ghcr.io/flyteorg/rawcontainers-shell:v2",
    command=[
        "./calculate-ellipse-area.sh"
    ],
)

@task
def report_all_calculated_areas(
    area_shell: float,
    metadata_shell: str,
):
    logger.info(f"shell: area={area_shell}, metadata={metadata_shell}")

@workflow
def wf(a: float, b: float):
    area_shell, metadata_shell = calculate_ellipse_area_shell(**{"my.value": a, "c": b})

    # Report on all results in a single task to simplify comparison
    report_all_calculated_areas(
        area_shell=area_shell,
        metadata_shell=metadata_shell,
    )

Actual behavior:

  1. (Step 1) Run this workflow, it compiles correctly.
  2. (Step 2) Try to rerun the task, the compilation step fails with an error similar to: Failed to run the Node: Failed to create the execution: failed to compile workflow for [resource_type:WORKFLOW project:"x" domain:"development" name:".flytegen.ellipse-area-metadata-shell" version:"x"] with err failed to compile workflow with err Collected Errors: 1 Error 0: Code: SyntaxError, Node Id: ellipse-area-metadata-shell, Description: Failed to parse element [my.value], err [unexpected number of matches [2] [[[my my] [value value]]]].

Expected behavior

This is a tricky bug because there are two options.

Option 1: Special characters in task inputs names are NOT okay

Rationale: flyte inputs names are "like" python variable names, which should not use special characters like . (see my.value in the repro code

Expected behavior: The workflow should actually fail at compile time at step 1. (e.g. perform extra validation)

Option 2: Special characters in task inputs names ARE okay

Rationale: flyte inputs names are a string value, and have no restriction on them unlike in python. For example, a flyte inputs name like 0myvar starts with 0, which is invalid python. However, this kind of input is valid Flyte task definition, and users today may be doing so. Introducing extra validation will break these users.

Expected behavior: Rerunning the task should not lead to any errors (step 2).

Additional context to reproduce

No response

Screenshots

No response

Are you sure this issue hasn't been raised already?

Have you read the Code of Conduct?

runllm[bot] commented 3 days ago

According to the Flyte documentation, Flyte supports defining task inputs and outputs using the kwtypes utility. However, the use of special characters in input names like . is not explicitly mentioned, which might be leading to unexpected behavior during workflow compilation.

Describe the bug

When defining a ContainerTask with input names containing special characters (e.g., my.value), the workflow compiles correctly initially but fails during rerun with a syntax error.

Expected behavior

There are two potential options for handling this:

  1. Special characters in task input names are NOT okay:

    • The workflow should fail at compile time if special characters are used in input names, ensuring that only valid names are allowed.
  2. Special characters in task input names are okay:

    • The workflow should handle special characters correctly and not fail during rerun, ensuring consistent behavior.

Reproduction Steps

  1. Define a ContainerTask with special characters in input names.
  2. Create a workflow that uses this task.
  3. Run the workflow (compiles correctly).
  4. Try to rerun the task (fails with a syntax error).

Example Code

import logging
from flytekit import ContainerTask, kwtypes, task, workflow

logger = logging.getLogger(__file__)

calculate_ellipse_area_shell = ContainerTask(
    name="ellipse-area-metadata-shell",
    input_data_dir="/var/inputs",
    output_data_dir="/var/outputs",
    inputs=kwtypes(**{"my.value": float, "c": float}),
    outputs=kwtypes(area=float, metadata=str),
    image="ghcr.io/flyteorg/rawcontainers-shell:v2",
    command=[
        "./calculate-ellipse-area.sh"
    ],
)

@task
def report_all_calculated_areas(
    area_shell: float,
    metadata_shell: str,
):
    logger.info(f"shell: area={area_shell}, metadata={metadata_shell}")

@workflow
def wf(a: float, b: float):
    area_shell, metadata_shell = calculate_ellipse_area_shell(**{"my.value": a, "c": b})

    report_all_calculated_areas(
        area_shell=area_shell,
        metadata_shell=metadata_shell,
    )

Actual Behavior

Sources:


This is an AI-generated response and your feedback is appreciated! Please leave a 👍 if this is helpful and 👎 if it is not.

Mention @runllm to ask follow-up questions. See runllm.com for more!

pingsutw commented 3 days ago

agreed. Workflow should fail at compile time at step 1.

ddl-rliu commented 14 hours ago

I believe this is resolved as per https://github.com/flyteorg/flytekit/pull/2538 Thanks!

pingsutw commented 13 hours ago

Thank you!