Open vladhc opened 8 months ago
Hi @vladhc - Dagster does not currently support custom user-defined partition mappings (because we can't load arbitrary user code in host processes like the backfill daemon).
We should raise an error earlier instead of failing in this way. @clairelin135 - thoughts on the best way to do this?
@vladhc - would StaticPartitionMapping
work for you if it didn't raise an error when one of the PartitionsDefinition
s is dynamic?
Hi @sryza . Thank you for fast reply.
would
StaticPartitionMapping
work for you if it didn't raise an error when one of thePartitionsDefinition
s is dynamic?
This won't be sufficient for us. We can have 2-3 partitions added/removed during a day. Also so far it's not possible to say in advance what the IDs of partitions would be.
As a workaround I can restart Dagster each day and use only StaticPartition
s and StaticPartitionMapping
. This would be sufficient for now.
Dagster does not currently support custom user-defined partition mappings (because we can't load arbitrary user code in host processes like the backfill daemon).
I see. This makes sense. Although for my use case, when all the partitions are simple one-to-one or one-to-many mappings, one could imagine passing only partition data structures to the daemon. In my fantasies I would even treat partitions as assets. Something like:
@partition_asset(io_manager='my_db_io_manager')
def images_partition(context, my_resource: MyAnotherResource) -> dagster.PartitionMapping:
# ... figure out partition mapping
return image_names # is a Mapping[str, Set[str]]. Serialized and sent to dagster's daemon.
@asset(partition_def=images_partition)
def model_predictions(...):
...
This would give all the benefits of assets: io managers, auto-materialization when upstream assets change, seeing their dependencies on the graph.
This won't be sufficient for us. We can have 2-3 partitions added/removed during a day. Also so far it's not possible to say in advance what the IDs of partitions would be.
In your code snippet, the COUNTRY_TO_USERS
mapping is hardcoded. Is the idea that this would actually be populated by reading from a database or something? Btw here's an issue that tracks relevant functionality: https://github.com/dagster-io/dagster/issues/13139.
In my fantasies I would even treat partitions as assets
Interesting. This seems somewhat relevant to this issue: https://github.com/dagster-io/dagster/issues/9559.
The behaviour is super frustrating, since running one partition at a time works correctly. It only fails then multiple or all partitions are requested to be run.
Would additional partition mapping be accepted in case PR were to be provided? I think something like a regexp partition mapping could go a long way to reduce this problem without having to support running arbitrary code in the dagster process
Dagster version
dagster, version 1.5.3
What's the issue?
Given:
Launching a backfill on the child asset will throw an error:
dagster._core.errors.DagsterInvariantViolationError: Asset partition AssetKeyPartitionKey(asset_key=AssetKey(['country_value']), partition_key='Germany') depends on invalid partition keys {AssetKeyPartitionKey(asset_key=AssetKey(['user_value']), partition_key='Germany')}
.Launching a single run for each partition (without backfill) will successfully materialize any partition of the child asset.
What did you expect to happen?
Backfill should successfully materialize child asset.
How to reproduce?
Materialize the
user_value
asset. Then launch backfill materialization forcountry_value
asset.The status of the backfill will become "Failed".
The error will be:
Deployment type
Local
Deployment details
No response
Additional information
StaticPartitionDefinition
andStaticPartitionMapping
everything works as expected.Message from the maintainers
Impacted by this issue? Give it a 👍! We factor engagement into prioritization.