futureverse / progressr

三 R package: An Inclusive, Unifying API for Progress Updates
https://progressr.futureverse.org
281 stars 12 forks source link

Latency with progress reporting #118

Closed dcnorris closed 3 years ago

dcnorris commented 3 years ago

I'm seeing large latencies which disappear upon setting options(progressr.enabled = FALSE): enabled disabled

Of note, parallelization happens in this code section, where the progress handler is NULL. Thus the latency seems attributable to the progress reporting signaling itself.

(The progress reporting signaling is happening in this code section.)

HenrikBengtsson commented 3 years ago

Thanks for the illustrative graphs. FWIW, it's on the future roadmap to add this type of "built-in speed and memory profiling tools" at the core of future, which certainly would help troubleshoot the recent performance issues related to memory and progressr.

Of note, parallelization happens in this code section, where the progress handler is NULL. Thus the latency seems attributable to the progress reporting itself.

Actually, not (unless I misunderstood your claim). Using with_progress(..., handlers = NULL) will basically be the same as not using with_progress(). It is with_progress() that calls the progress handlers, which in turn do the "progress reporting". Thus, there is no "progress reporting" taking place here.

Instead, your example suggests/gives evidence to the hypothesis that it's related to progress signalling. If you look at the figure in https://progressr.futureverse.org/#under-the-hood, the problem with the recently reported bug/large-overhead-with-large-objects issues, is most likely that there is a very costly overhead kicking in the "signaled conditions" part, i.e. when the progression objects are sent from the parallel worker(s) to the main R session. This is most likely because the progression condition objects thatare sent, in correctly, carry a huge memory load.

After installing the develop version (https://github.com/HenrikBengtsson/progressr/issues/116#issuecomment-878049832), what does your graph look like then?

dcnorris commented 3 years ago

Thanks for clarity on the terminology; I did mean 'signaling' and have edited OP to that effect. With v0.8.0-9001, the graphs remain the same: enabled-dev diabled-dev

In my particular application, where the computations are on the scale of ≈10 seconds, the lightweightness of the fork facility is essential. Is the large size of the progression condition objects likely inherent to the design of future? I had been tearing my hair out trying something along these lines using a fifo() but would so much rather use high-level abstractions such as you are providing in the futureverse.

HenrikBengtsson commented 3 years ago

Thanks for this. I'm quite sure this is due to an oversight by me in the progressr package, and I think (=likely) I should be able to fix this in the package itself. I'm quite sure I solved some parts in the develop branch, but there's apparently more things to fix. I will have to get some deep-focus time in order to fully understand and find a fix for the problem, so it'll take some time. Until then, my best advice for now is to not use progressr until resolved by me.

dcnorris commented 3 years ago

Many thanks, Henrik. I'll go back to tearing my hair out with fifo I guess, but very much look forward to being able to revisit the futureverse at some point!

HenrikBengtsson commented 3 years ago

Hi again, so I just saw that you're using your own custom handlers (https://github.com/dcnorris/precautionary/blob/f1d653485a89b9ff068470cc272a52df6c15b259/inst/shiny-apps/EscRisk/app.R#L198-L211). I'd like to rule out that this could be the culprit, what happens if you use, say, the built-in progressr::handler_txtprogressbar handler? It'll of course not render in the GUI, but in the terminal, but I'd be curious if it would make a difference.

HenrikBengtsson commented 3 years ago

Hi again, I think I've figured out what's going on and I've fixed this in progressr (>= 0.8.0-9005). Please try with:

remotes::install_github("HenrikBengtsson/progressr", ref = "develop")

and let me know if this solves the problem.

dcnorris commented 3 years ago

Thanks for the heads-up, Henrik. I'll retry this sometime soon and get back to you. I'll be very glad to have this working again, without the custom mclapply I added to my package as a workaround!