beyondcode / laravel-websockets

Websockets for Laravel. Done right.
https://beyondco.de/docs/laravel-websockets
MIT License
5.07k stars 618 forks source link

Crashed in production, supervisord process was still running #325

Closed ven0ms99 closed 3 years ago

ven0ms99 commented 4 years ago

So basically it stopped working, even though the process in supervisord was showing up as running (since 60 days). After I restarted the process with supervisord it worked again. What could have happened and how could I avoid this in the future?

tomcoonen commented 4 years ago

Do you have any logs available? Without that it will be a bit hard to diagnose.

rennokki commented 4 years ago

Bump

Tofandel commented 4 years ago

@Hillcow It happened to me, your issue is likely that the server ran out of memory, if that happens all request will fail because php cannot allocate more memory but the server will still be considered running (as the process doesn't exit if the failed allocation is within a try, catch) so supervisor will not restart it

Solution 1: increase the allowed memory for the ws server, this is a missing option in this lib that could be done with ini_set but can also be done when launching the command using php -d MEMORY_LIMIT=1G (doesn't work on all setups)

Solution 2: make sure to catch allocation failure exception and exit the process when that happens

rennokki commented 4 years ago

@Tofandel I assume Solution 2 can be adapted directly in this package by using a signal listener: https://github.com/reactphp/event-loop#addsignal

The bad thing is that that it needs the pcntl extension, so Windows is not favored for this kind of change.

Tofandel commented 4 years ago

The weird thing is that the loop kept running when out of memory, that's a fatal error so it shouldn't be caught in try catch but somehow the process still managed to continue and in the on error handler of the logger you can see the caught messages of "Unable to allocate xxx bytes" multiple times, once after each request

yurik94 commented 4 years ago

We resolved by restarting the process every week, very ugly but it's the only working solution for us as memory was below the max limit. Use also used pcntl extension as, by default, the worst loop was in use.

rennokki commented 4 years ago

Ugly-ugly, but we gotta find what causes this huge increase in memory. Prolly some memory leaks caused by closed connections? 🤔

Tofandel commented 4 years ago

@rennokki I don't really see how this would solve the issue as that would listen for external signals (eg: sigterm, sigend) and not internal unrecoverable errors

Yes there likely is some memory leak in this package I would suggest looking into the statistics part, last week I completely disabled the statistics by modifying some code because I didn't need them and didn't need the table as well, there is no memory increase over time anymore Though I also added a gc_collect_cycles everytime a connection closes, so it might also be some cyclic reference issue and if the garbage collector is not called enough and you get a high volume of request before garbage collection (eg 200 requests in 5 minutes) then the memory just blows

There is a lot of circular references eg ->app = app() in all the loggers singleton so I wouldn't be surprised

rennokki commented 4 years ago

I understand that PHP cannot create exceptions because the memory limit is... well, memory limit. So maybe we can check with a timer if the total ram reached a percentage in the last x minutes and restart the server.

Again, if this seems to be due to statistics, i understand. There was a PR merged for 1.6.1 that fixes a decorator for websockets by cloning the self. Perhaps it needs some on the other decorators.

Tofandel commented 4 years ago

@rennokki Yeah that was me :p Well actually I think for loggers, singleton should be dropped at least for the message logger and instead a trait or abstract class should be added, it's a big BC but the websocket logger is causing other issues by decorating the class

One of them is interface implementation, ratchet will check your class for some implementations and provide some features in that case (Eg: supported protocol with interface WsServerInterface) and by decorating you lose the interface and so the feature, I think logging should be done within the class by extension of onMessage and onError rather than wrapping the class within a class

For memory you can indeed add a loop ticker every 30sec and check if you are at more than 95% of ini_get('memory_limit') if so stop the server

rennokki commented 4 years ago

I can drop the decorators for the 2.x. It is still in beta and it needs thoroughly tested for this point.

rennokki commented 4 years ago

I basically managed to replicate it:

Click to see the logs ``` New connection opened for app key key. Connection id 91478157.695732491 sending message {"event":"pusher:connection_established","data":"{\"socket_id\":\"91478157.695732491\",\"activity_timeout\":30}"} Saving statistics... Using: 17.68 MB 1: connection id 91478157.695732491 received message: {"event":"pusher:subscribe","data":{"auth":"","channel":"public"}}. Connection id 91478157.695732491 sending message {"event":"pusher_internal:subscription_succeeded","channel":"public"} Connection id 91478157.695732491 closed. New connection opened for app key key. Connection id 806028557.614130089 sending message {"event":"pusher:connection_established","data":"{\"socket_id\":\"806028557.614130089\",\"activity_timeout\":30}"} 1: connection id 806028557.614130089 received message: {"event":"pusher:subscribe","data":{"auth":"","channel":"public"}}. Connection id 806028557.614130089 sending message {"event":"pusher_internal:subscription_succeeded","channel":"public"} Saving statistics... Using: 17.82 MB Connection id 806028557.614130089 closed. New connection opened for app key key. Connection id 180972669.616525848 sending message {"event":"pusher:connection_established","data":"{\"socket_id\":\"180972669.616525848\",\"activity_timeout\":30}"} 1: connection id 180972669.616525848 received message: {"event":"pusher:subscribe","data":{"auth":"","channel":"public"}}. Connection id 180972669.616525848 sending message {"event":"pusher_internal:subscription_succeeded","channel":"public"} Saving statistics... Using: 17.91 MB Connection id 180972669.616525848 closed. New connection opened for app key key. Connection id 133946535.732795730 sending message {"event":"pusher:connection_established","data":"{\"socket_id\":\"133946535.732795730\",\"activity_timeout\":30}"} 1: connection id 133946535.732795730 received message: {"event":"pusher:subscribe","data":{"auth":"","channel":"public"}}. Connection id 133946535.732795730 sending message {"event":"pusher_internal:subscription_succeeded","channel":"public"} Connection id 133946535.732795730 closed. Saving statistics... Using: 17.97 MB New connection opened for app key key. Connection id 803104692.328573343 sending message {"event":"pusher:connection_established","data":"{\"socket_id\":\"803104692.328573343\",\"activity_timeout\":30}"} 1: connection id 803104692.328573343 received message: {"event":"pusher:subscribe","data":{"auth":"","channel":"public"}}. Connection id 803104692.328573343 sending message {"event":"pusher_internal:subscription_succeeded","channel":"public"} Saving statistics... Using: 18.09 MB Saving statistics... Using: 18.18 MB Connection id 803104692.328573343 closed. Saving statistics... Using: 18.24 MB New connection opened for app key key. Connection id 169524850.746570909 sending message {"event":"pusher:connection_established","data":"{\"socket_id\":\"169524850.746570909\",\"activity_timeout\":30}"} 1: connection id 169524850.746570909 received message: {"event":"pusher:subscribe","data":{"auth":"","channel":"public"}}. Connection id 169524850.746570909 sending message {"event":"pusher_internal:subscription_succeeded","channel":"public"} Saving statistics... Using: 18.36 MB Saving statistics... Using: 18.44 MB Connection id 169524850.746570909 closed. Saving statistics... Using: 18.5 MB New connection opened for app key key. Connection id 137718526.920963224 sending message {"event":"pusher:connection_established","data":"{\"socket_id\":\"137718526.920963224\",\"activity_timeout\":30}"} 1: connection id 137718526.920963224 received message: {"event":"pusher:subscribe","data":{"auth":"","channel":"public"}}. Connection id 137718526.920963224 sending message {"event":"pusher_internal:subscription_succeeded","channel":"public"} Saving statistics... Using: 18.62 MB Saving statistics... Using: 18.71 MB Saving statistics... Using: 18.8 MB Saving statistics... Using: 18.9 MB Saving statistics... Using: 19 MB Saving statistics... Using: 19.09 MB Saving statistics... Using: 19.17 MB Saving statistics... Using: 19.26 MB Saving statistics... Using: 19.35 MB Saving statistics... Using: 19.44 MB Saving statistics... Using: 19.53 MB Saving statistics... Using: 19.62 MB Saving statistics... Using: 19.71 MB Saving statistics... Using: 19.8 MB Saving statistics... Using: 19.89 MB Saving statistics... Using: 19.97 MB Saving statistics... Using: 20.06 MB Saving statistics... Using: 20.16 MB Saving statistics... Using: 20.25 MB Saving statistics... Using: 20.33 MB Saving statistics... Using: 20.43 MB Saving statistics... Using: 20.53 MB Saving statistics... Using: 20.62 MB Saving statistics... Using: 20.71 MB Saving statistics... Using: 20.8 MB Saving statistics... Using: 20.88 MB Saving statistics... Using: 20.97 MB Saving statistics... Using: 21.06 MB Saving statistics... Using: 21.15 MB Saving statistics... Using: 21.27 MB Saving statistics... Using: 21.36 MB 1: connection id 137718526.920963224 received message: {"event":"pusher:ping","data":{}}. Connection id 137718526.920963224 sending message {"event":"pusher:pong"} Saving statistics... Using: 21.45 MB Saving statistics... Using: 21.54 MB Saving statistics... Using: 21.63 MB Saving statistics... Using: 21.71 MB Saving statistics... Using: 21.8 MB Saving statistics... Using: 21.89 MB Saving statistics... Using: 21.98 MB Saving statistics... Using: 22.07 MB Saving statistics... Using: 22.16 MB Saving statistics... Using: 22.25 MB Saving statistics... Using: 22.34 MB Saving statistics... Using: 22.42 MB Saving statistics... Using: 22.51 MB Saving statistics... Using: 22.6 MB ```
rennokki commented 4 years ago

Managed to solve the issue by working on the save() function from the statistic manager and add a $this->statistics = []; after the for each loop, but it still looks like it keeps something in-memory, then it drops again back to normal after a while.

Click to see the log ``` Starting the WebSocket server on port 6001... Saving statistics... Using: 16.63 MB Saving statistics... Using: 16.63 MB Saving statistics... Using: 16.63 MB New connection opened for app key key. Connection id 205785305.210932098 sending message {"event":"pusher:connection_established","data":"{\"socket_id\":\"205785305.210932098\",\"activity_timeout\":30}"} Saving statistics... Using: 17.68 MB 1: connection id 205785305.210932098 received message: {"event":"pusher:subscribe","data":{"auth":"","channel":"public"}}. Connection id 205785305.210932098 sending message {"event":"pusher_internal:subscription_succeeded","channel":"public"} Saving statistics... Using: 17.81 MB Saving statistics... Using: 17.81 MB Saving statistics... Using: 17.81 MB Saving statistics... Using: 17.81 MB Connection id 205785305.210932098 closed. New connection opened for app key key. Connection id 726284151.278333402 sending message {"event":"pusher:connection_established","data":"{\"socket_id\":\"726284151.278333402\",\"activity_timeout\":30}"} 1: connection id 726284151.278333402 received message: {"event":"pusher:subscribe","data":{"auth":"","channel":"public"}}. Connection id 726284151.278333402 sending message {"event":"pusher_internal:subscription_succeeded","channel":"public"} Saving statistics... Using: 17.91 MB Saving statistics... Using: 17.91 MB Saving statistics... Using: 17.91 MB Saving statistics... Using: 17.91 MB Connection id 726284151.278333402 closed. New connection opened for app key key. Connection id 939177083.989187003 sending message {"event":"pusher:connection_established","data":"{\"socket_id\":\"939177083.989187003\",\"activity_timeout\":30}"} 1: connection id 939177083.989187003 received message: {"event":"pusher:subscribe","data":{"auth":"","channel":"public"}}. Connection id 939177083.989187003 sending message {"event":"pusher_internal:subscription_succeeded","channel":"public"} Saving statistics... Using: 18 MB Saving statistics... Using: 18 MB Saving statistics... Using: 18 MB Connection id 939177083.989187003 closed. Saving statistics... Using: 18.06 MB New connection opened for app key key. Connection id 28250624.310040980 sending message {"event":"pusher:connection_established","data":"{\"socket_id\":\"28250624.310040980\",\"activity_timeout\":30}"} 1: connection id 28250624.310040980 received message: {"event":"pusher:subscribe","data":{"auth":"","channel":"public"}}. Connection id 28250624.310040980 sending message {"event":"pusher_internal:subscription_succeeded","channel":"public"} Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB 1: connection id 28250624.310040980 received message: {"event":"pusher:ping","data":{}}. Connection id 28250624.310040980 sending message {"event":"pusher:pong"} Saving statistics... Using: 18.27 MB Saving statistics... Using: 18.27 MB Saving statistics... Using: 18.27 MB Saving statistics... Using: 18.27 MB Saving statistics... Using: 18.27 MB Saving statistics... Using: 18.27 MB Saving statistics... Using: 18.27 MB Saving statistics... Using: 18.27 MB Saving statistics... Using: 18.27 MB Saving statistics... Using: 18.27 MB Saving statistics... Using: 18.27 MB Saving statistics... Using: 18.27 MB Saving statistics... Using: 18.27 MB Saving statistics... Using: 18.27 MB Saving statistics... Using: 18.27 MB Saving statistics... Using: 18.27 MB Saving statistics... Using: 18.24 MB Saving statistics... Using: 18.15 MB Saving statistics... Using: 18.15 MB Saving statistics... Using: 18.15 MB Saving statistics... Using: 18.15 MB Saving statistics... Using: 18.07 MB Saving statistics... Using: 18.07 MB Saving statistics... Using: 18.07 MB Connection id 28250624.310040980 closed. Saving statistics... Using: 18.13 MB Saving statistics... Using: 18.04 MB Saving statistics... Using: 18.04 MB Saving statistics... Using: 18.04 MB Saving statistics... Using: 17.95 MB Saving statistics... Using: 17.86 MB Saving statistics... Using: 17.86 MB Saving statistics... Using: 17.86 MB Saving statistics... Using: 17.86 MB Saving statistics... Using: 17.86 MB Saving statistics... Using: 17.86 MB Saving statistics... Using: 17.86 MB Saving statistics... Using: 17.86 MB Saving statistics... Using: 17.86 MB Saving statistics... Using: 17.86 MB Saving statistics... Using: 17.86 MB Saving statistics... Using: 17.86 MB Saving statistics... Using: 17.86 MB Saving statistics... Using: 17.86 MB Saving statistics... Using: 17.86 MB Saving statistics... Using: 17.86 MB Saving statistics... Using: 17.86 MB ```
Tofandel commented 4 years ago

@rennokki make sure to call gc_collect_cycles() before displaying the memory or you might see inaccurate result because of circular references then check if memory is still increasing

rennokki commented 4 years ago

@Tofandel https://github.com/beyondcode/laravel-websockets/pull/475/commits/ed96e24f6a83bdeb7cc9d57a24610edbcd790410 seems to fix the auto-incrementing of memory, but after the gc_collect_cycles(), seems that the memory increases incrementally on each re-connection using Echo.

1 sec tick, both statistics and memory count. Refreshed the page on each increase:

16.63 -> 17.73 -> 17.82 -> 17.91 -> 18.00 -> 18.09 -> 18.15 -> (closed the connection; no active connections at this point) -> 18.12 -> 18.04 -> 17.95 -> 17.77 -> 17.68

The base start is 16.63, and 17.68 is the last value and it's been like 3 mins and it still stays on 17.68, so it seems like you are right, there is some cycle that keeps being beaten up every time a new connection comes in. Maybe I can do this at scale and try to initiate multiple connections.

rennokki commented 4 years ago

Switching up on 2.x to \BeyondCode\LaravelWebSockets\Statistics\Logger\NullStatisticsLogger::class stops any leakage, so it's not the singleton itself, it's the code within that breaks things up. 🤔

rennokki commented 4 years ago

This thing here messes up the memory:

$this->browser
    ->post(
        action([WebSocketStatisticsEntriesController::class, 'store']),
        ['Content-Type' => 'application/json'],
        stream_for(json_encode($postData))
    );
rennokki commented 4 years ago

@Tofandel Seems like the controller caused the issue. Removed it in https://github.com/beyondcode/laravel-websockets/pull/475/commits/99b55411c1666a9518fbcaf1e0f80c87be33fb86 and it seems like the issue is gone, although it still increments after some time with 0.01 MB, like every 5th or 6th connection adds a 0.01 MB to the memory out of nowhere. 🤔

Tofandel commented 4 years ago

@rennokki that seems like a normal increase, some laravel helpers have a local cache to lookup singletons/controllers, convert strings etc, likely what you are seeing

rennokki commented 4 years ago

Won't you believe, but broadcast(new StatisticsUpdated(...)) is causing the issue. :/ Now I'm really confused. Disabling it would turn the memory consumption back to normal. More specific, this: https://github.com/beyondcode/laravel-websockets/pull/475/files#diff-ed8111d433470eea09c12aede397e6e2R108-R110

Disabling decorators has nothing to do with the memory leak, I can confirm it.

rennokki commented 4 years ago

I removed the event broadcasting that caused the issue with an automatic polling function in the dashboard. It keeps polling the /statistics endpoint for refreshes, allowing to disable or refresh on-demand.

Untitled

Tofandel commented 4 years ago

Isn't the memory leak caused by the broadcast function? If so the memory leak is within laravel and the issue is more serious

rennokki commented 4 years ago

👀

rennokki commented 4 years ago

Just posted https://github.com/laravel/framework/issues/33952

Tofandel commented 4 years ago

After, this package is hooking into the broadcast function to send websocket messages correct? In that case it might be wise to do some xdebugging and see where that function is going and what data it attaches, as it can either be in laravel or just in the broadcasting part of this lib

Edit: I'm looking into it

Tofandel commented 4 years ago

@rennokki I figured it out, the event you removed doesn't implement the ShouldBroadcastNow interface, as such it is added to a null queue and somehow kept in memory but broadcasted right away for some reason, something weird is going on, in any case changing ShouldBroadcast to ShouldBroadcastNow should fix it

rennokki commented 4 years ago

@Tofandel That means that although the driver is by default sync and the event gets triggered, the method still causes a memory leak.

rennokki commented 4 years ago

@Tofandel The memory leak still persists, even with ShouldBroadcastNow. Am i missing something?

Tofandel commented 4 years ago

@rennokki Debugging a little more, btw unused variable there, not that it's related

    public function broadcastOn()
    {
        $channelName = Str::after(DashboardLogger::LOG_CHANNEL_PREFIX.'statistics', 'private-');

        return new PrivateChannel(
            Str::after(DashboardLogger::LOG_CHANNEL_PREFIX.'statistics', 'private-')
        );
    }
rennokki commented 4 years ago

@Tofandel The problem is a bit tricky. Until we don't get a confirmation that https://github.com/laravel/framework/issues/33952 is caused by Laravel via the broadcast() method, I assume changes from https://github.com/beyondcode/laravel-websockets/pull/475 to remove the broadcaster and add the auto-polling are completely valid.

It is a bit weird to call it like the leak is from this package because the problems start with calling the broadcast() method. 🤔

rennokki commented 4 years ago

Gonna merge it for 2.x and see what happens afterward.

rennokki commented 4 years ago

@Tofandel So it seems like Laravel classified the issue as a bug 🤔 https://github.com/laravel/framework/issues/33952

francislavoie commented 4 years ago

@rennokki FYI, you should not be broadcasting inside of ReactPHP. That ends up using the synchronous Redis client (predis/phpredis) instead of the ReactPHP redis client which is async, non-blocking, and safe for use in the ReactPHP server loop.

rennokki commented 4 years ago

@mpociot pointed out this and going to be removed before 2.x gets released.

rennokki commented 3 years ago

The issue was caused by the Ignition package, see the quick-fix and follow-up coming from the Ignition team: https://github.com/beyondcode/laravel-websockets/issues/379#issuecomment-736716734

francislavoie commented 3 years ago

@rennokki I don't actually see a fix anywhere? Following all the links, all I see are MRs that were closed/rejected for various reasons, none merged.

rennokki commented 3 years ago

@francislavoie This specific comment: https://github.com/laravel/framework/issues/33952#issuecomment-693350705

ven0ms99 commented 3 years ago

Hey there, for me the problem unfortunately still persists. Laravel websockets randomly stops working and supervisor does not restart the process as it does not crash. I've looked through supervisor logs, Nginx logs, Laravel logs, but there is no sign of any failure.