Open EtienneT opened 10 months ago
I file this under additional information:
A question related to this is how to avoid repeating the metadata for each partition:
I not sure if I'm doing something wrong, but it seems that the one context.add_metadata_output
is repeated 14 * 8 in the event list. And every event output is the same but with different timestamps. See attached image. I guess is outputs the metadata for each partition requested.
Partitions requested materialization:
(13, DateTime(2024, 2, 1, 0, 0, 0, tzinfo=Timezone('UTC')), DateTime(2024, 2, 14, 0, 0, 0, tzinfo=Timezone('UTC')), ['tights.no', 'comfyballs.no', 'comfyballs.se', 'comfyballs.fi', 'comfyballs.com', 'awarenutrition.se', 'awarenutrition.fi', 'soma.no'], PartitionKeyRange(start='2024-02-01|tights.no', end='2024-02-13|soma.no'))
Event output:
What's the use case?
Let's say you have a partitioned asset with
BackfillPolicy.single_run()
, which means your asset could potentially be materializing multiple partitions at the same time. You return a dataframe which will then be separated in their individual partitions, but then you realize that there's no way to callcontext.add_output_metadata
for a specific partition inside your result. So if you need to add metadata per partition on your result, you can't.I guess you would rely on a separate asset observation, but this just add unnecessary overhead.
Ideas of implementation
No response
Additional information
No response
Message from the maintainers
Impacted by this issue? Give it a 👍! We factor engagement into prioritization.
What we've heard