Closed marvin-robot closed 2 years ago
This sounds like disabling checkpointing resolved another issue and that persisting results isn't actually the core issue here? We can take the feature request to disable persistence, but it sounds like resolving this Dask issue would be more meaningful for this user.
thanks @madkinsz - I agree this issue may be better resolved in some other ways, but this here is only one example request, @kvnkho saw more users with issues that occur due to checkpointing results (e.g. not enough memory to pass large dataframes between tasks that don't really need Results checkpointing)
it would be great if there was a way to disable it - it could potentially solve even issues such as this one: https://github.com/PrefectHQ/prefect/issues/5866
Yes the current checkpointing (if that is the one being inserted into the database) is causing a lot of HTTPX timeouts when people move from local Orion to Cloud 2.0, which makes Cloud 2.0 seem unstable but really it's a timeout due to a large payload I think.
Yes the current checkpointing (if that is the one being inserted into the database) is causing a lot of HTTPX timeouts when
This is a separate bug and will be fixed in the next release.
(e.g. not enough memory to pass large dataframes between tasks that don't really need Results checkpointing)
Stashing the value in a file and using a reference would only help with memory constraints here? Data needs to pass between tasks regardless of any checkpointing settings.
it would be great if there was a way to disable it - it could potentially solve even issues such as this one: https://github.com/PrefectHQ/prefect/issues/5866
Similarly, this is an issue with unpicklable data. Disabling checkpointing can help in some cases, but data still needs to be pickled for transport across tasks for several task runner types.
@madkinsz thanks for all the explanation, this helps a lot
data still needs to be pickled for transport across tasks for several task runner types
for Dask and Ray, correct? Concurrent and Sequential should work?
for Dask and Ray, correct? Concurrent and Sequential should work?
In theory, we shouldn't need to pickle things for those, yeah. Although things like database connections still might not share well across threads.
@anna-geller i believe that @kvnkho @madkinsz discussed this on Slack, but I would like to add my +1 to this issue. It represents a significant challenge in our ability to adopt prefect
, in particular all the functionality that comes with orion
.
Hi all, I got redirected from slack and asked to provide a few more details about my use-case :v:
My use-case involves handling of multi-dimensional image data from 2D up to 5D (3D volumes over time with multiple channels). The smaller images usually start with a size around 350MB but larger volumes can easily reach 10-20GB with whole datasets reaching multiple TB.
The current implementation of caching in Prefect 2.0 is rather deadly for such a use-case, because I would have to be extra careful to never pass an image between tasks. Otherwise caching will save (potentially very large) image files to disk, blowing up my file-storage.
What I really like about Prefect 1.0 is the fact that checkpointing is a conscious decision i.e. I don't duplicate TBs of data by accident. And by passing a custom LocalResult
type I am able to define how and where my cached results should be stored. Often times we want to look at the intermediate (cached) results to verify processing steps.
Currently, Prefect 2.0 is not a feasible solution for large image processing pipelines where I only want to cache very few task results in an accessible way. I would also be scared of accidentally filling up our storage system by running.
My dream-world-wish-list for caching in Prefect 2.0 would include:
LocalResult
concept.Thank you and happy to jump on a zoom to dive deeper (explain better) my use-case.
Opened from the Prefect Public Slack Community
tim.enders: What is the equivalent to this in 2.0?
@task(checkpoint=False)
If there is one currentlykevin701: None yet. It’s tied to results and configurability of results is not out yet
tim.enders: OK, cool. Gonna have to put 2.0 down then it seems. When I parallelize the operations it seems to want to spam getting a token from each Dask run. I know that checkpoint was the solution in 1.0
anna: <@ULVA73B9P> open "Allow globally disabling task run results as it's a blocker for 2.0 adoption"
Original thread can be found here.