dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
11.72k stars 1.48k forks source link

Writing a multi_asset that doesn’t use I/O managers could be improved #15417

Open jamiedemaria opened 1 year ago

jamiedemaria commented 1 year ago

What's the use case?

For example, this could be an multi_asset that uses a third party tools to split a dataset into training and testing data and store it in a db

trivial example showing the rough edges of the api

@multi_asset(
    outs={
       "file1": AssetOut(dagster_type=Nothing), 
       "file2": AssetOut(dagster_type=Nothing)}
)
def makes_two_assets():
    with open("data/file1.txt", "w") as f:
        f.write("foo")

    with open("data/file2.txt", "w") as f:
        f.write("bar")

    return None, None

Specifying the AssetOut(dagster_type=Nothing) is kinda weird, but not too bad. The user needs to tell us somehow that the multi_asset creates N assets, but doesn’t return any values. The return None, None is pretty bad though. It’s unintuitive, and implies that there’s something happening with the returned Nones. It also seems like a really easy mistake to make.

Ideas of implementation

No response

Additional information

No response

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

alangenfeld commented 1 year ago

what happens if you

jamiedemaria commented 1 year ago

omit return None, None and keeping dagster_type=Nothing:

dagster._core.errors.DagsterInvariantViolationError: op "makes_two_assets" has multiple outputs, but only one output was returned of type <class 'NoneType'>. When using multiple outputs, either yield each output, or return a tuple containing a value for each output. Check out the documentation on outputs for more: https://docs.dagster.io/concepts/ops-jobs-graphs/ops#outputs.

omit dagster_type=Nothing and keeping return None, None runs successfully, but invokes the IO manager and stores outputs

omit both

dagster._core.errors.DagsterInvariantViolationError: op "makes_two_assets" has multiple outputs, but only one output was returned of type <class 'NoneType'>. When using multiple outputs, either yield each output, or return a tuple containing a value for each output. Check out the documentation on outputs for more: https://docs.dagster.io/concepts/ops-jobs-graphs/ops#outputs.
bmarcj commented 1 year ago

I think most built in IOManager implementations handle the 'None' case? (Custom implementations certainly do when using dbt assets that have an IOManager specified.)