JuliaDocs / LiveServer.jl

Simple development server with live-reload capability for Julia.
MIT License
143 stars 26 forks source link

Silent crash in servedocs #143

Closed fredrikekre closed 2 years ago

fredrikekre commented 2 years ago

I often see silent exits from LiveServer.servedocs. This reproduces it for me:

  1. Clone this repo
  2. Instantiate the doc environment:
    julia --project=docs -e 'using Pkg; Pkg.develop(path=pwd())'
  3. Run julia --project=docs -e 'using LiveServer; LiveServer.servedocs()'

Now, in the opened browser, clicking around and loading new pages, after a LiveServer just silently returns.

I didn't have time to investigate further, but probably any of the trys in the package swallows the error (Setting LiveServer.VERBOSE[] = true didn't give any more info).

fredrikekre commented 2 years ago
┌ Error: ws_tracker error
│   exception =
│    IOError: write: broken pipe (EPIPE)
│    Stacktrace:
│      [1] uv_write(s::Sockets.TCPSocket, p::Ptr{UInt8}, n::UInt64)
│        @ Base ./stream.jl:1061
│      [2] unsafe_write(s::Sockets.TCPSocket, p::Ptr{UInt8}, n::UInt64)
│        @ Base ./stream.jl:1115
│      [3] unsafe_write
│        @ ./io.jl:683 [inlined]
│      [4] write(s::Sockets.TCPSocket, a::Vector{UInt8})
│        @ Base ./io.jl:706
│      [5] writeframe(io::HTTP.ConnectionPool.Connection, x::HTTP.WebSockets.Frame)
│        @ HTTP.WebSockets ~/.julia/packages/HTTP/XJG1J/src/WebSockets.jl:0
│      [6] pong(ws::HTTP.WebSockets.WebSocket, data::Vector{UInt8})
│        @ HTTP.WebSockets ~/.julia/packages/HTTP/XJG1J/src/WebSockets.jl:549
│      [7] pong
│        @ ~/.julia/packages/HTTP/XJG1J/src/WebSockets.jl:547 [inlined]
│      [8] ws_tracker(ws::HTTP.WebSockets.WebSocket)
│        @ LiveServer ~/dev/LiveServer/src/server.jl:358
│      [9] upgrade(f::typeof(LiveServer.ws_tracker), http::HTTP.Streams.Stream{HTTP.Messages.Request, HTTP.ConnectionPool.Connection}; suppress_close_error::Bool, maxframesize::Int64, maxfragmentation::Int64, kw::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
│        @ HTTP.WebSockets ~/.julia/packages/HTTP/XJG1J/src/WebSockets.jl:439
│     [10] upgrade
│        @ ~/.julia/packages/HTTP/XJG1J/src/WebSockets.jl:420 [inlined]
│     [11] (::LiveServer.var"#15#19"{HTTP.Handlers.var"#1#2"{LiveServer.var"#14#18"{typeof(identity), Bool, Bool, LiveServer.SimpleWatcher}}})(http::HTTP.Streams.Stream{HTTP.Messages.Request, HTTP.ConnectionPool.Connection})
│        @ LiveServer ~/dev/LiveServer/src/server.jl:436
│     [12] #invokelatest#2
│        @ ./essentials.jl:729 [inlined]
│     [13] invokelatest
│        @ ./essentials.jl:727 [inlined]
│     [14] handle_connection(f::Function, c::HTTP.ConnectionPool.Connection, listener::HTTP.Servers.Listener{Nothing, Sockets.TCPServer}, readtimeout::Int64, access_log::Nothing)
│        @ HTTP.Servers ~/.julia/packages/HTTP/XJG1J/src/Servers.jl:446
│     [15] (::HTTP.Servers.var"#16#17"{LiveServer.var"#15#19"{HTTP.Handlers.var"#1#2"{LiveServer.var"#14#18"{typeof(identity), Bool, Bool, LiveServer.SimpleWatcher}}}, HTTP.Servers.Listener{Nothing, Sockets.TCPServer}, Set{HTTP.ConnectionPool.Connection}, Int64, Nothing, Base.Semaphore, HTTP.ConnectionPool.Connection})()
│        @ HTTP.Servers ./task.jl:482
└ @ LiveServer ~/dev/LiveServer/src/server.jl:374

@quinnj can you have a look? I never saw this pre HTTP 1.0, so maybe something was incorrect in the upgrade?

The fe/debug branch on my fork as some more logging enabled by default.

quinnj commented 2 years ago

The backtrace suggests we're trying to write to a closed socket; if I remember correctly, we were aggressively swallowing errors like this before (in LiveServer.jl). We probably just need to review where is the appropriate layer to catch this and be handled. I'm on vacation for the next few weeks, but I can try to take a look at some point (just FYI in case it takes a few weeks).

tlienart commented 2 years ago

I recall issues with web sockets closing; is there a way to ensure that web sockets are not closed with HTTP.jl until the program has finished running (or has crashed)?

We use this:

https://github.com/tlienart/LiveServer.jl/blob/a196908e59dfbd6dba28b2e3f7f42b38d81e0f12/src/server.jl#L358-L361

which, clearly, is not an ideal way of doing things (if anything the 1.0s might be too long; we could just try decreasing that)

I'm not certain that's the source of the bug though. However that's where the error is thrown from (in OP's stack trace there's the ws_tracker error which comes from the catch below the try shown in the snippet.

What I wonder is if you had an existing web socket, interrupted LiveServer (or it crashed), then re-start LiveServer; does it somehow see the "old" websocket and then see it as timed out? (just guesses here).

tlienart commented 2 years ago

Right so it seems that it's the pong that fails at a time where it shouldn't, possibly because the ws has an issue or doesn't exist before the pong. Not sure what would be a good way to dig deeper into this. I'm not entirely sure but it might actually happen on the first pong. Given the HTTP Websockets.pong code seems to have changed over 1.0 this may be the issue (I think it was previously at WebSockets.jl with write_pong and is now in HTTP.WebSockets but the code is completely different)

In any case if people have suggestions as to how to keep the WS alive "cleanly", surely this code can be refactored and hopefully made less buggy.

j-fu commented 2 years ago

Seeing the same bug in the same situation...

mortenpi commented 2 years ago

The websockets are there for the live reload functionality, right? So as a workaround: is there an easy way to just disable the whole functionality? If you never set up the websocket, I presume it would stop crashing?

j-fu commented 2 years ago

... live reloading is the whole point of having this package - at least for me. Not supporting it anymore would make me stick with v0.7.4 which is the last not using HTTP v1.0. Which is easy with Pkg environments...

tlienart commented 2 years ago

@quinnj when you're back and have time, it'd be great to have your input on this as we've had multiple reports of issues with this unfortunately 😕 thanks!

tlienart commented 2 years ago

ok this is actually very annoying and crippling the functionality of LiveServer, so I'll revert the changes into a 0.10 release that relies on HTTP 0.9 until this is fully figured out. I'm aware this is probably a subtle bug that will take time to figure out but since the previous version worked fine, let's stick with it until we know that the next one works fine too. There's a handful of packages that rely on LiveServer being fairly solid most of the time so I don't think it's reasonable to keep stuff as they are now.

tlienart commented 2 years ago

--> done in 0.9.2 for those following this issue, kindly let me know if that works, there should be all the recent changes in, apart from the HTTP1 stuff.

quinnj commented 2 years ago

Sorry for the delay; got back this week and have been trying to catch up on things. I didn't quite follow how the releases went down here, I didn't seem to have a 0.9.1 tag in my local repo, so I branched off 0.9.0 and made this commit which fixes the original issue reported here. The issue was that we were opening the websocket connection, and writing pongs to it, which was fine for telling the browser to keep the connection alive, but when the file changed, we did a send, then close, but our async ws_tracker would then still attempt to send a pong after we already told the browser to close the connection. A bad fix would be to just catch the write error for that pong, but a better fix (as implemented in the commit referenced above), is to start an async task to read control messages from the browser. That way, if the browser initiates the close for whatever reason, we'll get that message and handle the end-of-life of the websocket much more gracefully.

@tlienart, can I inconvenience you to cherry-pick that commit wherever it needs to go? Alternatively, if you wanted to help me understand how the revert went down, I can try to figure that out, but I figured it might be easier for you to undo whatever you did and then cherry-pick the commit.

tlienart commented 2 years ago

Thanks a lot for looking into this @quinnj what I did is kind of similar to what you did I think (though probably less well executed)

I think the easiest here is that I cherry pick from your commit on a separate branch, experiment a bit with it, get others to try it, and once we're all convinced things are back in working order, I'll do a 0.10 release 😅

thanks a lot for the explanation with the pong stuff, makes sense to me

svilupp commented 2 years ago

I'm not sure if it helps, but an observation that I haven't seen mentioned:

I can reproduce the error/behaviour (liveserver silently shutting down) by simply pressing the back button on the previewed site in Chrome.

@Error (LiveServer.ws_tracker): ws_tracker error │ │ (Tuple{Base.IOError, Ve) exception = (Base.IOError("write: broken pipe (EPIPE)", -32), │ Union[Ptr @0x000000010ebc86c6, Ptr │ @0x00000001101ddd9c, Ptr @0x000 ╰──────────────────────────────────────────────── Thu, 18 Aug 2022 09:31:00

Versions

Solution: Upgrading to LiveServer#master solves the issue.

EDIT: I wasn't able to try out the http1b branch as I was getting Pkg add errors.

tlienart commented 2 years ago

thanks for the report @svilupp ; yes the latest LiveServer (release 0.9.2) should not have such issues, it's based on HTTP 0.9 and has been fairly solid. We're trying to figure out how to clean things up with HTTP1 in #146 but it's not trivial. For now sticking to LS 0.9.2 is the right way to go :)

tlienart commented 2 years ago

should be closed now with the release of 1.0 🤞