Open andipiet opened 2 years ago
I would expect this to work if you set the MEMOIZED_RUN_TAG on the schedule, or the scheduled job. E.g.
from dagster import MEMOIZED_RUN_TAG
my_schedule = ScheduleDefinition(tags={MEMOIZED_RUN_TAG: True}, ...)
Is that what you tried already?
CC @dpeng817.
I think this is most likely a bug. We should be passing state through when creating the execution plan, which is likely what is causing this error.
Yes the executor does try to use memoization but it crashes because the required state is not provided
Use Case
We have developed a Dagster job that supportes memoization (we have implemented memoizable IO managers and provided a version strategy).
Everything works fine as long as we start the job from the launchpad or we use backfills. However, when we define a schedule for the job and we enable it, we see the job fails with the following exception
Now, I tried to troubleshoot the issue by analyzing the Dagster source code and it looks like memoization cannot be supported for scheduled jobs. In particular, it looks like in the call here no known state and no Dagster instance information is provided.
Is this analysis correct or am I missing something?
Memoization support would be crucial for our job, that needs to periodically pull data from an external system by using API. In particular, the data is available only for a limited amount of time on the external system, so if we have to re-run the pipeline we would need to use the data that we have already stored (if we try to download them again they may no longer be there)
Ideas of Implementation
Additional Info
Message from the maintainers:
Excited about this feature? Give it a :thumbsup:. We factor engagement into prioritization.