fonsp / Pluto.jl

🎈 Simple reactive notebooks for Julia
https://plutojl.org/
MIT License
4.91k stars 284 forks source link

Add heap size hint #2774

Closed kirill-kondrashov closed 5 months ago

kirill-kondrashov commented 6 months ago

This MR intends to integrate the heap size hint:

that enables users to set a limit on memory usage, after which the garbage collector (GC) will work more aggressively to clean up unused memory.

For the Julia processes launched in Pluto.

github-actions[bot] commented 6 months ago

Try this Pull Request!

Open Julia and type:

  julia> import Pkg
  julia> Pkg.activate(temp=true)
  julia> Pkg.add(url="https://github.com/kir19890817/Pluto.jl", rev="add-julia-heap-size-hint")
  julia> using Pluto
pankgeorg commented 6 months ago

One thing we've found to work better than heap-size-hint, is actually invoking the GC.gc() function on a background task, when memory pressure on a system is important and is creating problems.

kirill-kondrashov commented 6 months ago

One thing we've found to work better than heap-size-hint, is actually invoking the GC.gc() function on a background task, when memory pressure on a system is important and is creating problems.

I see; it sounds like a good thing to check. Let me have a look.

fonsp commented 6 months ago

Can you make this PR again, but:

I would recommend searching for math_mode and doing the exact same for heap_size_hint

fonsp commented 6 months ago

@pankgeorg That's a nice suggestion but let's track that in a new issue and PR! This is just about a command line flag that was added in Julia 1.9

pankgeorg commented 6 months ago

@pankgeorg That's a nice suggestion but let's track that in a new issue and PR! This is just about a command line flag that was added in Julia 1.9

True, this is completely unrelated to the PR, it's mostly an offtopic response that addresses the intent of the PR, not the content of it. I'll open a new issue, and probably also a PR for memory management.

kirill-kondrashov commented 6 months ago

Can you make this PR again, but:

* Avoid formatting unrelated lines of code

* Only add the option to `CompilerOptions`, not also to `ServerOptions`

* Make sure that the tests pass

I would recommend searching for math_mode and doing the exact same for heap_size_hint

Sure, thanks for comment! I've initially tried to put it into Draft: status by typing it in the title, but it seems to not affect the state of the PR on github... So thanks for changing the status.

kirill-kondrashov commented 6 months ago

@pankgeorg That's a nice suggestion but let's track that in a new issue and PR! This is just about a command line flag that was added in Julia 1.9

True, this is completely unrelated to the PR, it's mostly an offtopic response that addresses the intent of the PR, not the content of it. I'll open a new issue, and probably also a PR for memory management.

Hey, @pankgeorg thanks for the suggestion again!

Could you please elaborate a bit on what you mean by background task here? As far as I can track (with my poor understanding of the codebase as a disclaimer), for instance, in case of running the notebooks for exporting, in PlutoSliderServer.jl in the function run_directory which is called by export_directory, the new 'process' is created:

new = process(s; server_session, settings, output_dir, start_dir, progress)

process is implemented in Actions.jl. It triggers Pluto.jl/src/webserver/SessionActions.jl open. Because it seems to be a separate process, there's no way to trigger GC.gc() somewhere 'in the background', outside of Pluto.jl.

fonsp commented 5 months ago

Thanks @kirill-kondrashov ! Let us know when the PR is ready

pankgeorg commented 5 months ago

@pankgeorg That's a nice suggestion but let's track that in a new issue and PR! This is just about a command line flag that was added in Julia 1.9

True, this is completely unrelated to the PR, it's mostly an offtopic response that addresses the intent of the PR, not the content of it. I'll open a new issue, and probably also a PR for memory management.

Hey, @pankgeorg thanks for the suggestion again!

Could you please elaborate a bit on what you mean by background task here? As far as I can track (with my poor understanding of the codebase as a disclaimer), for instance, in case of running the notebooks for exporting, in PlutoSliderServer.jl in the function run_directory which is called by export_directory, the new 'process' is created:

new = process(s; server_session, settings, output_dir, start_dir, progress)

process is implemented in Actions.jl. It triggers Pluto.jl/src/webserver/SessionActions.jl open. Because it seems to be a separate process, there's no way to trigger GC.gc() somewhere 'in the background', outside of Pluto.jl.

Indeed. Pluto does inject some code to the host process though, so it could also add a Timer(() -> GC.gc(), 3600.; interval=3600.), or something a little bit more complex. Pluto itself runs a webserver too, so Pluto.run() could also have a function that periodically does a full sweep.

The tricky part (which julia could/should do, but currently doesn't) is to track

  1. whether it's a good time to run the GC now (good time = not too stressed)
  2. whether there has been enough allocations to warrant a GC sweep.

I think the GC doesn't keep statistics of allocations around, so knowing [2] isn't very easy, at least not without consulting the kernel.

On the other hand, none of this should be Pluto's job in the first place right? This effort should most likely be another package.

kirill-kondrashov commented 5 months ago

Thanks @kirill-kondrashov ! Let us know when the PR is ready

Seems to pass the tests now

kirill-kondrashov commented 5 months ago

@pankgeorg That's a nice suggestion but let's track that in a new issue and PR! This is just about a command line flag that was added in Julia 1.9

True, this is completely unrelated to the PR, it's mostly an offtopic response that addresses the intent of the PR, not the content of it. I'll open a new issue, and probably also a PR for memory management.

Hey, @pankgeorg thanks for the suggestion again! Could you please elaborate a bit on what you mean by background task here? As far as I can track (with my poor understanding of the codebase as a disclaimer), for instance, in case of running the notebooks for exporting, in PlutoSliderServer.jl in the function run_directory which is called by export_directory, the new 'process' is created:

new = process(s; server_session, settings, output_dir, start_dir, progress)

process is implemented in Actions.jl. It triggers Pluto.jl/src/webserver/SessionActions.jl open. Because it seems to be a separate process, there's no way to trigger GC.gc() somewhere 'in the background', outside of Pluto.jl.

Indeed. Pluto does inject some code to the host process though, so it could also add a Timer(() -> GC.gc(), 3600.; interval=3600.), or something a little bit more complex. Pluto itself runs a webserver too, so Pluto.run() could also have a function that periodically does a full sweep.

The tricky part (which julia could/should do, but currently doesn't) is to track

1. whether it's a good time to run the GC _now_ (good time = not too stressed)

2. whether there has been enough allocations to warrant a GC sweep.

I think the GC doesn't keep statistics of allocations around, so knowing [2] isn't very easy, at least not without consulting the kernel.

On the other hand, none of this should be Pluto's job in the first place right? This effort should most likely be another package.

Makes sense! I think Pluto, indeed, shouldn't be the first place; PlutoSliderServer seemed to me as a suitable place to insert that procedure.

fonsp commented 5 months ago