dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
11.71k stars 1.48k forks source link

Creating an IO Manager called io_manager overwrites the default one #12617

Open marcilj opened 1 year ago

marcilj commented 1 year ago

Dagster version

dagster, version 1.1.19

What's the issue?

I don't know if this is a bug or a feature, but it took me quite some time to find out what was going on.

If you provide your assets with a resource called io_manager this will overwrite the default fs_io_manager that is configured for all assets that don't have io_manager_key configured.

I was really confused about why all my assets where getting saved to S3 without me configuring them.

What did you expect to happen?

I would expect that the default io_manager doesn't get overwritten if we call a resource io_manager.

Or that this is clearly written in the io_manager doc here.

How to reproduce?

Create a resource called io_manager.

RESOURCES_SANDBOX = {
    "io_manager": common_bucket_s3_pickle_io_manager,
}

Provide this resource to your assets with:

defs = Definitions(
    assets=all_assets,
    resources=resources_by_deployment_name[DEPLOYMENT_NAME],
    schedules=all_schedules,
    jobs=all_jobs,
)

Deployment type

None

Deployment details

No response

Additional information

No response

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

tacastillo commented 1 year ago

Hey Jacob! Thanks for persisting this into a GitHub issue! You're right that we should update the docs on this because it is intended behavior that can throw users off.

I can triage this and squeeze it in as a quick fix some time soon, but if you're interested, would you like to be a contributor to the Dagster project and add a sentence or two to our I/O manager page explaining this behavior, probably under this section? I'll gladly approve the PR if you do. Fair game if you don't want to, but figured I'd ask to see if you'd like the credit!

laisdeghaide commented 10 months ago

Hi there! I am having the opposite problem, I want to replace the default io_manager with s3_pickle_io_manager. I did exactly (or I think so) the same this described in the docs:

"io_manager": S3PickleIOManager(
        s3_resource=S3Resource(),
        s3_bucket="arado-datapond-raw",
        s3_prefix="dagster/storage",
    )
defs = create_repository_using_definitions_args(
    name="dagarado",
    assets=all_assets,
    resources=resources_by_deployment_name[get_deployment_name()],
    jobs=all_jobs,
    executor=in_process_executor,
    schedules=all_schedules,
    sensors=all_sensors,
    asset_checks=all_checks,
)

but it is not replacing and it's saying this: "dagster._core.errors.DagsterInvalidDefinitionError: Conflicting versions of resource with key 'io_manager' were provided to different assets. When constructing a job, all resource definitions provided to assets must match by reference equality for a given key."

@tacastillo @sryza

jamiedemaria commented 10 months ago

~hey @laisdeghaide do you have any instances in your code where you're directly setting the resource_defs or io_manager_def parameters on the @asset decorator? those might be the source of the conflicts.~

edit - moved to https://github.com/dagster-io/dagster/issues/19262