Locally, this task can be cached while in a cluster execution it can't be. Flyteconsole says "Caching was disabled for this execution".
As a user, I have a strong preference for being able to cache tasks without a return value as tasks can have side effects (like e.g. storing a resulting metric in a metadata store) which don't need a return value but are still supposed to be cached. We have multiple tasks in our code base that have a dummy return value only to allow the task to be cached.
[ ] Cache misses upon schema changes:
from dataclasses import dataclass
from dataclasses_json import dataclass_json
from flytekit import task, workflow
@dataclass_json
@dataclass
class Foo:
a: int
# b: int
@task(cache=True, cache_version="1.0")
def t1() -> Foo:
print("Foo")
return Foo(a=42) #, b=42)
@workflow
def wf():
t1()
if __name__ == "__main__":
wf()
When executing this workflow, adding b: int to Foo as an example of a schema change, and executing again, there is an expected cache miss in the remote execution but an unexpected cache hit in the local execution. The local behaviour needs to be adapted.
Additional context to reproduce
No response
Screenshots
No response
Are you sure this issue hasn't been raised already?
Describe the bug
As a user, I would expect that the caching behaviour is the same when executing a workflow in a cluster vs executing it locally as a python script.
In practice, there are situations where the behaviour differs:
Expected behavior
[ ] Caching of tasks without return values:
Locally, this task can be cached while in a cluster execution it can't be. Flyteconsole says "Caching was disabled for this execution".
As a user, I have a strong preference for being able to cache tasks without a return value as tasks can have side effects (like e.g. storing a resulting metric in a metadata store) which don't need a return value but are still supposed to be cached. We have multiple tasks in our code base that have a dummy return value only to allow the task to be cached.
[ ] Cache misses upon schema changes:
When executing this workflow, adding
b: int
toFoo
as an example of a schema change, and executing again, there is an expected cache miss in the remote execution but an unexpected cache hit in the local execution. The local behaviour needs to be adapted.Additional context to reproduce
No response
Screenshots
No response
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?