JuliaPluto / Malt.jl

Simple multiprocessing for Julia
https://juliapluto.github.io/Malt.jl/
MIT License
47 stars 8 forks source link

BUG: MethodError: Cannot `convert` an object of type NotString to an object of type String #65

Closed schlichtanders closed 11 months ago

schlichtanders commented 11 months ago

I just run into a Malt bug. I cannot reproduce it, but luckily the bug is obvious once seen

(to get this stacktrace I had to do quite a lot...)

MethodError: Cannot `convert` an object of type ArgumentError to an object of type String

Closest candidates are:
  convert(::Type{S}, !Matched::CategoricalArrays.CategoricalValue) where S<:Union{AbstractChar, AbstractString, Number}
   @ CategoricalArrays ~/.julia/packages/CategoricalArrays/0yLZN/src/value.jl:92
  convert(::Type{T}, !Matched::RCall.RObject{S}) where {T, S<:RCall.Sxp}
   @ RCall ~/.julia/packages/RCall/gOwEW/src/convert/base.jl:1
  convert(::Type{String}, !Matched::String)
   @ Base essentials.jl:298
  ...

Stacktrace:
  [1] Malt.RemoteException(worker::Malt.Worker, message::ArgumentError)
    @ Malt ~/.julia/packages/Malt/vfPGf/src/Malt.jl:31
  [2] unwrap_worker_result(worker::Malt.Worker, result::Malt.WorkerResult)
    @ Malt ~/.julia/packages/Malt/vfPGf/src/Malt.jl:50
  [3] _wait_for_response(worker::Malt.Worker, msg_id::UInt64)
    @ Malt ~/.julia/packages/Malt/vfPGf/src/Malt.jl:325
  [4] _send_receive
    @ ~/.julia/packages/Malt/vfPGf/src/Malt.jl:336 [inlined]
  [5] #remote_call_wait#43
    @ ~/.julia/packages/Malt/vfPGf/src/Malt.jl:422 [inlined]
  [6] remote_call_wait
    @ ~/.julia/packages/Malt/vfPGf/src/Malt.jl:421 [inlined]
  [7] remote_eval_wait
    @ ~/.julia/packages/Malt/vfPGf/src/Malt.jl:491 [inlined]
  [8] remote_eval_wait(w::Malt.Worker, expr::Expr)
    @ Malt ~/.julia/packages/Malt/vfPGf/src/Malt.jl:492
  [9] eval_format_fetch_in_workspace(session_notebook::Tuple{Pluto.ServerSession, Pluto.Notebook}, expr::Expr, cell_id::Base.UUID; ends_with_semicolon::Bool, function_wrapped_info::Nothing, forced_expr_id::Nothing, known_published_objects::Vector{String}, user_requested_run::Bool, capture_stdout::Bool)
    @ Pluto.WorkspaceManager ~/.julia/packages/Pluto/ifg58/src/evaluation/WorkspaceManager.jl:431
 [10] eval_format_fetch_in_workspace
    @ ~/.julia/packages/Pluto/ifg58/src/evaluation/WorkspaceManager.jl:406 [inlined]
 [11] run_single!(session_notebook::Tuple{Pluto.ServerSession, Pluto.Notebook}, cell::Pluto.Cell, reactive_node::Pluto.ReactiveNode, expr_cache::Pluto.ExprAnalysisCache; user_requested_run::Bool, capture_stdout::Bool)
    @ Pluto ~/.julia/packages/Pluto/ifg58/src/evaluation/Run.jl:490
 [12] run_reactive_core!(session::Pluto.ServerSession, notebook::Pluto.Notebook, old_topology::Pluto.NotebookTopology, new_topology::Pluto.NotebookTopology, roots::Vector{Pluto.Cell}; save::Bool, deletion_hook::typeof(Pluto.WorkspaceManager.move_vars), user_requested_run::Bool, already_run::Vector{Pluto.Cell}, bond_value_pairs::Base.Iterators.Zip{Tuple{Vector{Symbol}, Vector{Any}}})
    @ Pluto ~/.julia/packages/Pluto/ifg58/src/evaluation/Run.jl:381
 [13] run_reactive_core!(session::Pluto.ServerSession, notebook::Pluto.Notebook, old_topology::Pluto.NotebookTopology, new_topology::Pluto.NotebookTopology, roots::Vector{Pluto.Cell}; save::Bool, deletion_hook::typeof(Pluto.WorkspaceManager.move_vars), user_requested_run::Bool, already_run::Vector{Pluto.Cell}, bond_value_pairs::Base.Iterators.Zip{Tuple{Vector{Symbol}, Vector{Any}}}) (repeats 2 times)
    @ Pluto ~/.julia/packages/Pluto/ifg58/src/evaluation/Run.jl:429
 [14] run_reactive_core!
    @ ~/.julia/packages/Pluto/ifg58/src/evaluation/Run.jl:54 [inlined]
 [15] (::Pluto.var"#342#348"{Bool, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, Pluto.ServerSession, Pluto.Notebook, Bool, Val{:py}})()
    @ Pluto ~/.julia/packages/Pluto/ifg58/src/evaluation/Run.jl:675
 [16] withtoken(f::Pluto.var"#342#348"{Bool, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, Pluto.ServerSession, Pluto.Notebook, Bool, Val{:py}}, token::Pluto.Token)
    @ Pluto ~/.julia/packages/Pluto/ifg58/src/evaluation/Tokens.jl:19
 [17] #341
    @ ~/.julia/packages/Pluto/ifg58/src/evaluation/Run.jl:639 [inlined]
 [18] macro expansion
    @ ~/.julia/packages/Pluto/ifg58/src/evaluation/Tokens.jl:58 [inlined]
 [19] (::Pluto.var"#333#334"{Pluto.var"#341#347"{Bool, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, Pluto.ServerSession, Pluto.Notebook, Bool, Val{:py}}})()
    @ Pluto ./task.jl:514

Luckily it points to https://github.com/JuliaPluto/Malt.jl/blob/34e6226b80ee5f8a16ea0fb4ba5b41d0f00c6284/src/Malt.jl#L50

which implicitly assumes that result.value is of type String, which apparently is not always the case. Some extra handling should be setup.

schlichtanders commented 11 months ago

I think I understood where it comes from

  1. deserialization might fail, and as result return a value identical to the error message (which itself is a bad thing - it should also include the catch_backtrace()) https://github.com/JuliaPluto/Malt.jl/blob/34e6226b80ee5f8a16ea0fb4ba5b41d0f00c6284/src/worker.jl#L80

  2. then the Malt message id is changed to Serialization Failure id https://github.com/JuliaPluto/Malt.jl/blob/34e6226b80ee5f8a16ea0fb4ba5b41d0f00c6284/src/worker.jl#L87

  3. then a special handler just rewrites directly to an Exception Failure https://github.com/JuliaPluto/Malt.jl/blob/34e6226b80ee5f8a16ea0fb4ba5b41d0f00c6284/src/worker.jl#L152-L159

  4. and voilá, the original exception happens.

Pangoraw commented 11 months ago

Thank you for the report and investigation. One simple fix could be something like that:

diff --git a/src/worker.jl b/src/worker.jl
index 0aa8240..98d3d0f 100644
--- a/src/worker.jl
+++ b/src/worker.jl
@@ -77,7 +77,7 @@ function serve(server::Sockets.TCPServer)
         msg_data, success = try
             deserialize(io), true
         catch err
-            err, false
+            sprint(showerror, err) * sprint(Base.show_backtrace, catch_backtrace()), false
         finally
             _discard_until_boundary(io)
         end

Edit: your suggestion from https://github.com/JuliaPluto/Malt.jl/issues/66 is actually better!

schlichtanders commented 11 months ago

After fixing it slightly myself, it turns out that the ArgumentError is quite special

Remote exception from Malt.Worker on port 9487 with PID 487:
ArgumentError("array must be non-empty")

Looking through the history for "array must be non-empty", the only places I could find is pop! and popfirst! (the version without default). https://github.com/JuliaLang/julia/blob/8e5136fa2979885081cd502d2210633dff1d2a1a/base/array.jl#L1330-L1332

function pop!(a::Vector)
    if isempty(a)
        throw(ArgumentError("array must be non-empty"))
    end

Unfortunately, I couldn't find any meaningfull place inside Malt where pop! is used. I now switched to Malt DistributedWorker mode and there the error does not appear. So maybe also this ArgumentError is another a Malt bug.

Pangoraw commented 11 months ago

Unfortunately, I couldn't find any meaningfull place inside Malt where pop! is used.

are you using any custom deserialization ?

schlichtanders commented 11 months ago

yes, I have fixed PythonCall's serialization and deserialization to use "dill" instead of "pickle". But maybe it would also happen when using the default PythonCall serialization...

I am also using plain Julia and RCall - and both do work well in Malt, only the PythonCall examples fail immediately because of whatever this underlying error is. With DistributedWorker everything works fine without problems in all three cases.

schlichtanders commented 11 months ago

I probably won't have time debugging this further as the Distributed backend works without problems.

I guess a simple PythonCall test would show it. Here are the serialization method overwrites which I defined in my custom Plutowrapper:

# Fix server Side Python to use dill instead of pickle
function PythonCall.serialize_py(s, x::Py)
    if PythonCall.pyisnull(x)
        PythonCall.serialize(s, nothing)
    else
        b = @pyconst(pyimport("dill").dumps)(x)
        PythonCall.serialize(s, PythonCall.pybytes_asvector(b))
    end
end

function PythonCall.deserialize_py(s)
    v = PythonCall.deserialize(s)
    if v === nothing
        PythonCall.pynew()
    else
        @pyconst(pyimport("dill").loads)(pybytes(v))
    end
end

Maybe this is what breaks Malt, but not Distributed.