Closed Firionus closed 1 year ago
Unfortunately, the high memory usage is flaky for me :bowl_with_spoon: :disappointed: I reproduced this multiple times on the day I opened the issue, but now I can't. Same versions of everything.
I toyed around with it some more, and I can now reproduce the issue when I add show_value=true
:
begin
using PlutoUI
@bind x Slider(range(0, 1, length=10^8), show_value=true)
end
Executing the cell quickly leads to an out of memory condition and the notebook process is killed.
The Pluto terminal shows the following error afterwards:
Worker 2 terminated.Distributed
.ProcessExitedExceptionUnhandled Task ERROR: EOFError: read end of file
Stacktrace:
[1] (::Base.var"#wait_locked#645")(s::Sockets.TCPSocket, buf::IOBuffer, nb::Int64)
@ Base ./stream.jl:892
[2] unsafe_read(s::Sockets.TCPSocket, p::Ptr{UInt8}, nb::UInt64)
@ Base ./stream.jl:900
[3] unsafe_read
@ ./io.jl:724 [inlined]
[4] unsafe_read(s::Sockets.TCPSocket, p::Base.RefValue{NTuple{4, Int64}}, n::Int64)
@ Base ./io.jl:723
[5] read!
@ ./io.jl:725 [inlined]
[6] deserialize_hdr_raw
@ ~/.julia/juliaup/julia-1.7.2+0~x64/share/julia/stdlib/v1.7/Distributed/src/messages.jl:167 [inlined]
[7] message_handler_loop(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool)
@ Distributed ~/.julia/juliaup/julia-1.7.2+0~x64/share/julia/stdlib/v1.7/Distributed/src/process_messages.jl:165
[8] process_tcp_streams(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool)
@ Distributed ~/.julia/juliaup/julia-1.7.2+0~x64/share/julia/stdlib/v1.7/Distributed/src/process_messages.jl:126
[9] (::Distributed.var"#99#100"{Sockets.TCPSocket, Sockets.TCPSocket, Bool})()
@ Distributed ./task.jl:423
(2)
Stacktrace:
[1] worker_from_id(pg::Distributed.ProcessGroup, i::Int64)
@ Distributed ~/.julia/juliaup/julia-1.7.2+0~x64/share/julia/stdlib/v1.7/Distributed/src/cluster.jl:1089
[2] worker_from_id
@ ~/.julia/juliaup/julia-1.7.2+0~x64/share/julia/stdlib/v1.7/Distributed/src/cluster.jl:1086 [inlined]
[3] #remotecall_fetch#158
@ ~/.julia/juliaup/julia-1.7.2+0~x64/share/julia/stdlib/v1.7/Distributed/src/remotecall.jl:496 [inlined]
[4] remotecall_fetch
@ ~/.julia/juliaup/julia-1.7.2+0~x64/share/julia/stdlib/v1.7/Distributed/src/remotecall.jl:496 [inlined]
[5] remotecall_eval
@ ~/.julia/juliaup/julia-1.7.2+0~x64/share/julia/stdlib/v1.7/Distributed/src/macros.jl:242 [inlined]
[6] collect_soft_definitions(session_notebook::Tuple{Pluto.ServerSession, Pluto.Notebook}, modules::Set{Expr})
@ Pluto.WorkspaceManager ~/.julia/packages/Pluto/AgKPu/src/evaluation/WorkspaceManager.jl:476
[7] run_reactive_core!(session::Pluto.ServerSession, notebook::Pluto.Notebook, old_topology::Pluto.NotebookTopology, new_topology::Pluto.NotebookTopology, roots::Vector{Pluto.Cell}; deletion_hook::Function, user_requested_run::Bool, already_run::Vector{Pluto.Cell}, bond_value_pairs::Base.Iterators.Zip{Tuple{Vector{Symbol}, Vector{Any}}})
@ Pluto ~/.julia/packages/Pluto/AgKPu/src/evaluation/Run.jl:216
[8] run_reactive_core!
@ ~/.julia/packages/Pluto/AgKPu/src/evaluation/Run.jl:70 [inlined]
[9] (::Pluto.var"#255#259"{Bool, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, Pluto.ServerSession, Pluto.Notebook})()
@ Pluto ~/.julia/packages/Pluto/AgKPu/src/evaluation/Run.jl:591
[10] withtoken(f::Pluto.var"#255#259"{Bool, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, Pluto.ServerSession, Pluto.Notebook}, token::Pluto.Token)
@ Pluto ~/.julia/packages/Pluto/AgKPu/src/evaluation/Tokens.jl:19
[11] #254
@ ~/.julia/packages/Pluto/AgKPu/src/evaluation/Run.jl:587 [inlined]
[12] macro expansion
@ ~/.julia/packages/Pluto/AgKPu/src/evaluation/Tokens.jl:58 [inlined]
[13] (::Pluto.var"#225#226"{Pluto.var"#254#258"{Bool, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, Pluto.ServerSession, Pluto.Notebook}})()
@ Pluto ./task.jl:423
But I guess that's just a result of the process shutting down, not the cause of the high memory usage.
Thanks! This is indeed caused by show_value=true
, because we call string.(values)
and send those rendered values to JavaScript to display: https://github.com/JuliaPluto/PlutoUI.jl/blob/47167739ec772924469d3528977e8d6a688b3319/src/Builtins.jl#L192
This is how we support something like Slider([sin, cos, tan]; show_value=true)
.
I think the best solution here is to resample the slider values on the Julia side. E.g. if there are 1,000,000 possible values, then it is fine to just slide over every 1000th value. That would also fix #215
It would be helpful if someone wrote a function that does the following 💕
function downsample(x::AbstractVector, max_size::Integer)
return something...
end
x1 = [1,2,3]
x2 = rand(500)
@test downsample(x1, 3) == [1,2,3]
@test downsample(x1, 30) == [1,2,3]
@test downsample(x1, 2) == [1,3]
@test downsample(x2, 500) == x2
y2 = downsample(x2, 400)
@test 250 <= length(y2) <= 400
@test y2[begin] == x2[begin]
@test y2[end] == x2[end]
x3 = rand(50_000_000)
# this should take less than 0.1ms
downsample(x3, 100)
@fonsp I created a Pluto notebook with my code for a downsample(x::AbstractVector, max_size::Integer)
function and all the tests pass. What are my next steps for contributing this code? I'm unclear where in PlutoUI's source this function would live, but I assume I will need to fork PlutoUI in order to add this function and create a pull request. Is that correct?
@wilsongoode That is great to hear! I've actually also been working on an algorithm but progress has been slow because I don't know much about discrete math and I've had other things in my life coming up. I've got a little interactive notebook of my version sitting here: https://gist.github.com/Firionus/ccd45b2daf8d305cd6f1ee9e4535e9e5
While it mostly does what it should, it sometimes misses better solutions allowed in the output length budget because it does not try out larger step sizes (see the "Problematic Examples" section and the TODO in the function). Would you mind sharing your solution in a Gist as well so we can compare our approaches?
@Firionus Your algorithm is much more involved than what I created. I'd say I took the naive approach: https://gist.github.com/wilsongoode/3fd25d5f3a731ecb1037bd4098eb1645
Wow! That is an incredibly concise nearest neighbor algorithm, I'm impressed.
I've been thinking some more about my approach. The basic idea is that when you downsample from an array of length N
to something that has at most length n_max
, you sometimes can achieve a really even downsampling where almost all downsampling steps are the same size by choosing an output length somewhat smaller than n_max
.
For example, when length(x) == 50_000_000
and n_max = 100
, you can downsample to a length of 100 with a nearest neighbor algorithm like yours, but you'll get constant switching between steps of size 505050 and steps of size 505051. Wouldn't it be nice if all steps are the same size, creating an output that is more linear compared to a linear input? That would be possible with an output size of n = 81
, resulting in 79 steps of size 625000 and only 1 step in the middle of size 624999.
The problem is that I'm not sure how to find these "nice" numbers quickly in all cases. A problem statement might look something like this:
Given integers
N >= 2
,n_max >= 1
,n_min >= 1
withN > n_max >= n_min
, find a tuple of non-negative integers(n_s, n_l, s)
subject tos >= 1
,N = n_s*s + n_l*(s+1)
andn_min <= n_s + n_l <= n_max
such that eithern_s
orn_l
is minimized. With multiple solutions of same merit, prefer one wheren_s + n_l
is maximized.
Here N = length(x) - 1
(number of steps in input) and n_max
is the maximum number of steps in the output (so one smaller than n_max in Fons's problem statement). n_s
is the number of small steps and n_l
is the number of large steps. s
is the size of the small step and the large step is one bigger.
Unfortunately I just don't have the chops in integer optimization to properly solve this but I have a feeling a general solution would be hard to find in short time.
Therefore, I'd suggest we stick with a simple nearest neighbor algorithm for now where the output length is just n_max
. That way, we don't end up with half-baked solutions in the code base that no one understands.
As I said, I really like the brevity of @wilsongoode's solution, but performance is about half as fast as what I currently have. The main issue is that round.(...)
allocates an array for the indices. This is easily fixed by changing to an array comprehension, bringing performance to the level of my solution at much shorter code length:
So, my suggestions based on @wilsongoode's is the following:
function downsample(x::AbstractVector, max_size::Integer)
if max_size >= length(x)
x
else
[x[round(Int, i)] for i in
range(firstindex(x), lastindex(x), length=max_size)]
end
end
As for contributing this change, I think we just need to put something like values = downsample(values, max_size)
before https://github.com/JuliaPluto/PlutoUI.jl/blob/47167739ec772924469d3528977e8d6a688b3319/src/Builtins.jl#L161
I guess max_size
should be configurable in the constructor for Slider
and documented. A default of 1_000 wouldn't be completely wrong. And a test to hit the downsampling in CI would probably be a good idea as well. For example I can't test right now whether the range syntax I used is supported in v1.4. The suggested tests from fons also should be put somewhere. Unfortunately I have no idea right now how tests are handled in this package, so you're gonna have to go looking a bit.
I assume I will need to fork PlutoUI in order to add this function and create a pull request. Is that correct?
Yeah. I'd leave that to you, @wilsongoode, if that is okay? Feel free to just open it as draft and mention this issue if you want me to take a look at your code.
When creating a slider with inappropriately high resolution, the memory usage becomes very high, slowing down the computer or even killing the Julia process.
Steps to Reproduce
Observe the memory usage of the Julia process going up massively, on my PC up to around 12 GB before it is killed.
Expected Behavior
Maybe a word on what my use case was: I was toying around with the plot of an ODE solution and wanted to continuously vary a parameter and some boundary conditions. Without much thinking, I set the slider step size to some arbitrary small number like 1e-6 and had no idea why my notebook was suddenly running so slow.
System Info