dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
11.94k stars 1.49k forks source link

GraphIn/Out defaults ins/out type to dict and fails type check #6499

Open scott-arne opened 2 years ago

scott-arne commented 2 years ago

Summary

It seems that if a type is not explicitly stated for function parameters decorated by @graph, then they are assumed to be dict types and will fail Dagster type checking. I'm not sure if that's intentional (did not run across it in the docs).

The following two minimum reproductions (hopefully) show this behavior. The first script gives the error, and the does not explicitly specify the parameter type within proper_greeting. Seemingly, the name parameter is assumed to be a dict.

@graph
def proper_greeting(name):
    return add_punctuation(say_hi(name))

The second script does not have an error, and the only difference is explicitly specifying the parameter type within proper_greeting.

@graph
def proper_greeting(name: String):
    return add_punctuation(say_hi(name))

Reproduction

Gives Error

from dagster import op, graph, String, In, Out

@op(
    ins={"name": In(String)},
    out=Out(String)
)
def say_hi(name):
    return f'Hello {name}'

@op(
    ins={"message": In(String)}
)
def add_punctuation(context, message):
    context.log.info(f'{message}!')

@graph
def proper_greeting(name):
    return add_punctuation(say_hi(name))

job = proper_greeting.to_job()

if __name__ == "__main__":
    job.execute_in_process(
        run_config={
            "inputs": {
                "name": "Scott"
            }
        }
    )

Gives the following error:

Traceback (most recent call last):
  File "/Users/johnss51/Development/python/dagster-schrodinger/bug.py", line 27, in <module>
    job.execute_in_process(
  File "/Users/johnss51/Development/python/dagster-schrodinger/venv/lib/python3.9/site-packages/dagster/core/definitions/job_definition.py", line 172, in execute_in_process
    return core_execute_in_process(
  File "/Users/johnss51/Development/python/dagster-schrodinger/venv/lib/python3.9/site-packages/dagster/core/execution/execute_in_process.py", line 34, in core_execute_in_process
    execution_plan = create_execution_plan(
  File "/Users/johnss51/Development/python/dagster-schrodinger/venv/lib/python3.9/site-packages/dagster/core/execution/api.py", line 753, in create_execution_plan
    resolved_run_config = ResolvedRunConfig.build(pipeline_def, run_config, mode=mode)
  File "/Users/johnss51/Development/python/dagster-schrodinger/venv/lib/python3.9/site-packages/dagster/core/system_config/objects.py", line 159, in build
    raise DagsterInvalidConfigError(
dagster.core.errors.DagsterInvalidConfigError: Error in config for job
    Error 1: Value for selector type at path root:inputs:name must be a dict

No Error

By explicitly specifying the type proper_greeting(name: String) the dict assumption goes away and the code runs:

from dagster import op, graph, String, In, Out

@op(
    ins={"name": In(String)},
    out=Out(String)
)
def say_hi(name):
    return f'Hello {name}'

@op(
    ins={"message": In(String)}
)
def add_punctuation(context, message):
    context.log.info(f'{message}!')

@graph
def proper_greeting(name: String):
    return add_punctuation(say_hi(name))

job = proper_greeting.to_job()

if __name__ == "__main__":
    job.execute_in_process(
        run_config={
            "inputs": {
                "name": "Scott"
            }
        }
    )

Gives the following successful output:

2022-02-04 15:15:07 -0800 - dagster - DEBUG - proper_greeting - 727e22cd-d02f-4840-b4ec-23da817284b3 - 59685 - RUN_START - Started execution of run for "proper_greeting".
2022-02-04 15:15:07 -0800 - dagster - DEBUG - proper_greeting - 727e22cd-d02f-4840-b4ec-23da817284b3 - 59685 - ENGINE_EVENT - Executing steps in process (pid: 59685)
2022-02-04 15:15:07 -0800 - dagster - DEBUG - proper_greeting - 727e22cd-d02f-4840-b4ec-23da817284b3 - 59685 - ENGINE_EVENT - Starting initialization of resources [io_manager].
2022-02-04 15:15:07 -0800 - dagster - DEBUG - proper_greeting - 727e22cd-d02f-4840-b4ec-23da817284b3 - 59685 - ENGINE_EVENT - Finished initialization of resources [io_manager].
2022-02-04 15:15:07 -0800 - dagster - DEBUG - proper_greeting - 727e22cd-d02f-4840-b4ec-23da817284b3 - 59685 - say_hi - LOGS_CAPTURED - Started capturing logs for step: say_hi.
2022-02-04 15:15:07 -0800 - dagster - DEBUG - proper_greeting - 727e22cd-d02f-4840-b4ec-23da817284b3 - 59685 - say_hi - STEP_START - Started execution of step "say_hi".
2022-02-04 15:15:07 -0800 - dagster - DEBUG - proper_greeting - 727e22cd-d02f-4840-b4ec-23da817284b3 - 59685 - say_hi - STEP_INPUT - Got input "name" of type "String". (Type check passed).
2022-02-04 15:15:07 -0800 - dagster - DEBUG - proper_greeting - 727e22cd-d02f-4840-b4ec-23da817284b3 - 59685 - say_hi - STEP_OUTPUT - Yielded output "result" of type "String". (Type check passed).
2022-02-04 15:15:07 -0800 - dagster - DEBUG - proper_greeting - 727e22cd-d02f-4840-b4ec-23da817284b3 - 59685 - say_hi - HANDLED_OUTPUT - Handled output "result" using IO manager "io_manager"
2022-02-04 15:15:07 -0800 - dagster - DEBUG - proper_greeting - 727e22cd-d02f-4840-b4ec-23da817284b3 - 59685 - say_hi - STEP_SUCCESS - Finished execution of step "say_hi" in 1.04ms.
2022-02-04 15:15:07 -0800 - dagster - DEBUG - proper_greeting - 727e22cd-d02f-4840-b4ec-23da817284b3 - 59685 - add_punctuation - LOGS_CAPTURED - Started capturing logs for step: add_punctuation.
2022-02-04 15:15:07 -0800 - dagster - DEBUG - proper_greeting - 727e22cd-d02f-4840-b4ec-23da817284b3 - 59685 - add_punctuation - STEP_START - Started execution of step "add_punctuation".
2022-02-04 15:15:07 -0800 - dagster - DEBUG - proper_greeting - 727e22cd-d02f-4840-b4ec-23da817284b3 - 59685 - add_punctuation - LOADED_INPUT - Loaded input "message" using input manager "io_manager", from output "result" of step "say_hi"
2022-02-04 15:15:07 -0800 - dagster - DEBUG - proper_greeting - 727e22cd-d02f-4840-b4ec-23da817284b3 - 59685 - add_punctuation - STEP_INPUT - Got input "message" of type "String". (Type check passed).
2022-02-04 15:15:07 -0800 - dagster - INFO - proper_greeting - 727e22cd-d02f-4840-b4ec-23da817284b3 - add_punctuation - Hello Scott!
2022-02-04 15:15:07 -0800 - dagster - DEBUG - proper_greeting - 727e22cd-d02f-4840-b4ec-23da817284b3 - 59685 - add_punctuation - STEP_OUTPUT - Yielded output "result" of type "Any". (Type check passed).
2022-02-04 15:15:07 -0800 - dagster - DEBUG - proper_greeting - 727e22cd-d02f-4840-b4ec-23da817284b3 - 59685 - add_punctuation - HANDLED_OUTPUT - Handled output "result" using IO manager "io_manager"
2022-02-04 15:15:07 -0800 - dagster - DEBUG - proper_greeting - 727e22cd-d02f-4840-b4ec-23da817284b3 - 59685 - add_punctuation - STEP_SUCCESS - Finished execution of step "add_punctuation" in 1.16ms.
2022-02-04 15:15:07 -0800 - dagster - DEBUG - proper_greeting - 727e22cd-d02f-4840-b4ec-23da817284b3 - 59685 - ENGINE_EVENT - Finished steps in process (pid: 59685) in 8.52ms
2022-02-04 15:15:07 -0800 - dagster - DEBUG - proper_greeting - 727e22cd-d02f-4840-b4ec-23da817284b3 - 59685 - RUN_SUCCESS - Finished execution of run for "proper_greeting".

Additional Info about Your Environment

Mac OS X 11.6.1 Python 3.9.7 dagster==0.13.18


Message from the maintainers:

Impacted by this bug? Give it a 👍. We factor engagement into prioritization.

yuhan commented 2 years ago

Hi @scott-arne, thanks for the bug report. It is a known issue that I'm working on a fix for.

scott-arne commented 2 years ago

Awesome, thank you for the update!