fonsp / Pluto.jl

🎈 Simple reactive notebooks for Julia
https://plutojl.org/
MIT License
5k stars 296 forks source link

Error in fallback path for catching logging errors #3012

Closed danielwe closed 1 month ago

danielwe commented 2 months ago

GPUCompiler.jl implements slightly tweaked loggers for safe use in generated functions:

https://github.com/JuliaGPU/GPUCompiler.jl/blob/27d4b190243f9a4440e8c0bfda876af89dd1140e/src/utils.jl#L55-L73

Pluto.jl fails to display these logs, instead erroring with ArgumentError: Base.TTY(RawFD(4294967295) invalid status, 0 bytes waiting) is not initialized.

MWE notebook (this does not require a GPU, GPUCompiler is only used via Enzyme): https://gist.github.com/danielwe/c30299b9c11bb0cc5ce182913b640da3

In the REPL this emits a warning:

julia> include("pluto_gpucompiler_error.jl");
┌ Warning: Using fallback BLAS replacements for (["drot_64_"]), performance may be degraded
└ @ Enzyme.Compiler ~/.julia/packages/GPUCompiler/Y4hSX/src/utils.jl:59
dx = [-0.8660254037844386]

In Pluto it errors:

Screencast from 08-25-2024 11:33:19 AM.webm

wsmoses commented 2 months ago

cc @vchuravy didn't michel or someone else hit this?

danielwe commented 2 months ago

The displayed error comes from printing to stderr in the catch block of handle_message, here: https://github.com/fonsp/Pluto.jl/blob/aaa835ba8ba3318f46231af354fb58dd29cc91c7/src/runner/PlutoRunner/src/io/logging.jl#L116

So I guess there's actually two issues here, one from failing to display the log, sending us to the catch block, and one from failing to print to stderr in the catch block

danielwe commented 2 months ago

As an experiment, I inserted error() inside the try block in handle_message, such that every logging attempt ends up in the catch block. The result is that any attempt at logging inside a notebook, like @info "hello", produces the same error as above: ArgumentError: Base.TTY(RawFD(4294967295) invalid status, 0 bytes waiting) is not initialized.

In other words, the catch block is broken independently of GPUCompiler. The const original_stderr = stderr apparently isn't a valid, writable IO when handle_message is called.

Then there's the more fundamental issue of why GPUCompiler logs end up in the catch block in the first place. But fixing the catch block would make this a lot easier to debug.

danielwe commented 2 months ago

The incompatibility with GPUCompiler logs is due to the yield() calls around the logger redirection, introduced in #2026. Removing these fixes the problem. The point of GPUCompiler's special logging macros is to avoid task switches, so explicitly yielding in handle_message defeats their purpose.

Not sure how to reconcile the need to avoid task switching here with the reason for introducing #2026 in the first place.

The issue with the catch block seems to go away if original_stderr is replaced with stderr. The error is then printed in the REPL where Pluto was started, which I assume was the intention. (This is if you land in the catch block from an ordinary log. If you get there from a GPUCompiler log you still get an error, because println contains yield points... but fixing this corner case seems like a low priority, this should occur rarely if ever once the main issue is fixed.)

danielwe commented 2 months ago

Actually, that last parenthetical issue can be fixed by using Core.stderr instead of stderr. Prints to the former are apparently yield-free. To implement this with color if available, leave the catch block as-is and change the definitions of original_std*** to the following:

const original_stdout = IOContext(Core.stdout, :color => get(stdout, :color, false))
const original_stderr = IOContext(Core.stderr, :color => get(stderr, :color, false))
danielwe commented 2 months ago

Since https://github.com/JuliaGPU/GPUCompiler.jl/pull/622 was merged and released, GPUCompiler logs don't crash anymore, so I renamed this issue to refer specifically to the error in the logging fallback path which is fixed by #3013