dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
11.51k stars 1.45k forks source link

Make output metadata available on the InputContext #20094

Open jamiedemaria opened 7 months ago

jamiedemaria commented 7 months ago

What's the use case?

Users may want to add output (runtime) metadata to an output and use that metadata in load_input. This is not currently possible

@asset 
def my_asset_adds_metadata():
    return Output(1, metadata={"foo": "bar"}  # or via context.add_output_metadata

class MyIOManager(IOManager):
    def load_input(self, context):
        context.upstream_output.metadata # only provides definition level metadata

Ideas of implementation

One of the main complications for implementing this feature is that output metadata is stored in the event log. This means that the InputContext would need to query the event log in order to get the output metadata. For assets, this is less of an issue since we already have the ability to query the event log for the latest materialization event, which contains the metadata. For op-jobs this is more difficult since we would need to query for the output handled event from a specific run. We don't currently have the ability to do this efficiently.

Additional information

No response

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

jamiedemaria commented 7 months ago

related to https://github.com/dagster-io/dagster/issues/17923 which primarily focused on accessing output metadata in handle_output/OutputContext

zcemycl commented 2 months ago

Any progress on this?