JuliaLang / IJulia.jl

Julia kernel for Jupyter
MIT License
2.78k stars 409 forks source link

Kernel exception: usage_request #1091

Open marius311 opened 1 year ago

marius311 commented 1 year ago

On some particular cluster's Jupyterlab, I keep geting these messages periodically in random cells:

KERNEL EXCEPTION
KeyError: key "usage_request" not found

Stacktrace:
 [1] getindex(h::Dict{String, Function}, key::String)
   @ Base ./dict.jl:484
 [2] eventloop(socket::ZMQ.Socket)
   @ IJulia ~/.julia/packages/IJulia/6TIq1/src/eventloop.jl:8
 [3] (::IJulia.var"#14#17")()
   @ IJulia ./task.jl:514
Julia installed from binaries, IJulia v1.24.0, Jupyterlab 3.6.5 ``` Julia Version 1.9.2 Commit e4ee485e909 (2023-07-05 09:39 UTC) Platform Info: OS: Linux (x86_64-linux-gnu) CPU: 128 × AMD EPYC 7763 64-Core Processor WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-14.0.6 (ORCJIT, znver3) Threads: 129 on 128 virtual cores Environment: LD_LIBRARY_PATH = /global/u1/m/marius/.julia/artifacts/fac7e6d8fc4c5775bf5118ab494120d2a0db4d64/lib:/global/u1/m/marius/.julia/artifacts/7661e5a9aa217ce3c468389d834a4fb43b0911e8/lib:/global/u1/m/marius/.julia/artifacts/d00220164876dea2cb19993200662745eed5e2db/lib:/global/u1/m/marius/.julia/juliaup/julia-1.9.2+0.x64.linux.gnu/bin/../lib/julia:/global/u1/m/marius/.julia/artifacts/49d9387d0153ebcfb578e03cd2c58ddff2ef980b/lib:/global/u1/m/marius/.julia/artifacts/416d108e827d01dce771c4cbee18f8dcff37a3b3/lib:/global/u1/m/marius/.julia/artifacts/51cb236ffdb7e1ed1b9d44f14c81f2b84bc46520/lib:/global/u1/m/marius/.julia/juliaup/julia-1.9.2+0.x64.linux.gnu/bin/../lib/julia:/global/u1/m/marius/.julia/juliaup/julia-1.9.2+0.x64.linux.gnu/bin/../lib:/opt/cray/pe/python/3.9.13.1/lib:/opt/cray/pe/gcc-libs:/global/u1/m/marius/lib:/opt/cray/pe/papi/7.0.0.1/lib64:/opt/cray/pe/gcc/11.2.0/snos/lib64:/opt/cray/libfabric/1.15.2.0/lib64:/opt/cray/libfabric/1.15.2.0/lib64 JULIA_MPI_TRANSPORT = TCP JULIA_REVISE_POLL = 1 JULIA_MPI_PATH = JULIA_PROJECT = @. JULIA_MPIEXEC = srun JULIA_NO_VERIFY_HOSTS = github.com JULIA_NUM_THREADS = 1 JULIA_PYTHONCALL_EXE = /global/u1/m/marius/.cache/pypoetry/virtualenvs/muse-3g-lO5pQ0cA-py3.8/bin/python JULIA_CONDAPKG_BACKEND = Null ```
stevengj commented 1 year ago

Hmm, I don't see usage_request in the Jupyter protocol docs. @minrk, is this something new?

There was a PR to add a usage_request message, but it was apparently rejected: https://github.com/jupyterlab/jupyterlab/pull/11285

Looks like it might be a kernel extension? https://github.com/jupyter-server/jupyter-resource-usage

I guess we should simply ignore messages we don't understand?

stevengj commented 1 year ago

For example, we could replace https://github.com/JuliaLang/IJulia.jl/blob/74e3e09d266f92af8f80737b9b18bf97217203dd/src/eventloop.jl#L8

with

invokelatest(get(handlers, msg.header["msg_type"], default_handler), socket, msg)

where default_handler does nothing.

stevengj commented 1 year ago

We could additionally add support for the https://github.com/jupyter-server/jupyter-resource-usage extension, I guess, similar to how it is handled in ipykernel: https://github.com/ipython/ipykernel/blob/6bf3fe9e44f1caf4bc371f700ac0c2e9c9c3bd84/ipykernel/kernelbase.py#L996-L1027

Though that would seem to require a Julia equivalent of the Python psutil package, and I'm not sure that exists?

marius311 commented 1 year ago

Don't know how these things work but as a first thing maybe just output it to the Jupyter log instead of the cell stderr? (or ofc your suggestion of do nothing) I don't actually use that extension, although looks like the cluster does comes with it preinstalled so that seems like a good theory its that.

JBlaschke commented 1 year ago

Some more context: we see this on jupyter.nersc.gov -- It's intermittent, and seems to vary over time. Due to this, I thought it might be related to other issues we're having at NERSC. However, based in this new information, I am going to revise my theory:

It's possible that this kernel extension was added/changed recently. It makes sense to have something like this on shared nodes. I therefore suggest that:

Ping'ing @rcthomas

stevengj commented 1 year ago

1092 should work around the immediate issue by ignoring unknown requests.

Responding to usage_data requests properly is much more difficult since we don't have the analogue of psutil, and pulling out the relevant information/code seems to be quite complicated to do cross-platform (https://github.com/giampaolo/psutil/issues/2296).

ytdHuang commented 1 year ago

May I ask whether there will be a patch release soon for solving this KERNEL EXCEPTION ? The error messages keeps popping out in the jupyter-lab, which is a bit annoying lol.

etejedor commented 8 months ago

Is https://github.com/JuliaLang/IJulia.jl/pull/1092 ready to be merged? It would be nice to have a patch release that at least silences these messages. Many thanks!