enable per-source-asset configuration on IO managers

dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.

https://dagster.io

Apache License 2.0

11.71k stars 1.48k forks source link

enable per-source-asset configuration on IO managers #9351

Open sryza opened 2 years ago

sryza commented 2 years ago

IO managers currently support input_config_schema and output_config_schema. output_config_schema allows providing configuration that dictates:

how a particular output (or asset) is stored
how it's loaded in all the downstream places it's used

Analogous "how it's loaded in all the downstream places it's used" config might also be useful for source assets.

This would be useful in a situation where you want to kick off a run that targets a particular file that you decide at runtime.

acroz commented 2 years ago

This would be really useful. For what it's worth, the "natural" place for me to have this config would be in the source asset, i.e. if I could do the following:

entities = SourceAsset("entities", config_schema={"url": str})

If this config was then available to the IOManager, I could have e.g. a production run read from S3:

assets:  # Or perhaps separated as source_assets
  entities:
    config:
      url: s3://mybucket/path/to/entities.csv

or a "test" run using a file on the local filesystem:

assets:
  entities:
    config:
      url: path/to/entities.csv

k1sauce commented 2 years ago

moleary-gsa commented 7 months ago

@sryza I see this is stale but would love to have this available.

I can do this with ops via the type system (with a custom type and loader) however this functionality is missing for assets.

The best I can think of to workaround this is to have a root asset (not a source asset) that has a config schema which provides all available locations for data loading.

The problem with that approach is that feels like an anti-pattern. I would much prefer to have the input logic handled by the IO manager in that layer instead of implementing that logic within the root asset body