Open N-Demir opened 8 months ago
Hi @N-Demir, thanks for submitting this. I'm sorry about the friction here! I also appreciate the effort you put into writing this issue.
Do you have an MRE you can share with what you were providing to task_input_hash
that can show/reproduce the behavior you were seeing? Thanks!
First check
Prefect Version
2.x
Describe the current behavior
I initially thought this was a prefect bug, but after a lot of investigation I was able to determine that
cloudpickle
used bytask_input_hash
is non deterministic from run to run in lots of different scenarios. See the following issues opened on their end:For me, the issue came from trying to pass in a class to a prefect task with caching enabled using
task_input_hash
. Strangely this is only deterministic if the class is first imported from another module and not defined in the same file as the script that kicks off the flow (not defined in__main__
).This single hard to debug issue has led to an incredible amount of frustration at prefect and confusion among myself and team members, and from surveying the landscape of caching in python there doesn't seem to be an incredible out of the box easier solution
Describe the proposed behavior
I'm not expert enough to know of what the best solution is or if any of the other libraries I mentioned are definitively better, but from a user's perspective I want to emphasize one simpler thing that streamlit does well that has saved me a ton of pain in the past: the
UnhashableParamError
In streamlit's caching if you try to cache a custom object it will just error and you will receive a
UnhashableParamError
. Had prefect done this for me it would've alleviated so so much confusion about why caching wasn't working (leading me down rabbit holes of getting more confused with the distinction between prefect caching and results).Perhaps an error is aggressive, but even a warning log would make a huge difference to debugging why from run to run prefect's caching isn't working. And maybe
task_input_hash
should be more restrictive in forcing user's to have hashable args instead of blindly using a nondeterministic cloudpickle as the failsafe to jsonExample Use
No response
Additional context
No response