beyondcode / laravel-websockets

Websockets for Laravel. Done right.
https://beyondco.de/docs/laravel-websockets
MIT License
5.07k stars 613 forks source link

Randomly "Failed connect to pusher" When Sending Message While Server Short Spike #623

Closed kelvin195eclipse closed 7 months ago

kelvin195eclipse commented 3 years ago

Hi,

I have been using this package on my web app, running with Laravel Queue & Supervisord since June and everything was working great. My app is a lottery result announcing app, that running live from 7-8pm every Wed and Weekends. Which means, during these period of time, my app will have constant high volume of visitors and requests (~5-10 users per second).

However as my app user increases recently, I've been facing intermittent "Failed connect to pusher" problems since last month. I checked all the logs and tried to identify the error, it seems like happening in a pattern where especially during those short load spikes of the server (> 1 of load average, I have 6 cores CPU).

What is actually happening in the background that causing this random fails, and what approaches can I try to resolve it?

Thank you!

rennokki commented 3 years ago

Make sure you got enough file descriptors set https://beyondco.de/docs/laravel-websockets/faq/deploying#open-connection-limit

Also check this: http://socketo.me/docs/deploy#ulimit

kelvin195eclipse commented 3 years ago

Make sure you got enough file descriptors set https://beyondco.de/docs/laravel-websockets/faq/deploying#open-connection-limit

Also check this: http://socketo.me/docs/deploy#ulimit

Thank you for the lead!

I did some research and I found a command to check my server's file descriptors status:

Screenshot 2020-12-02 at 3 19 29 PM

Is this the correct place I could check whether file descriptors limit is the root cause?

Thanks again!

kelvin195eclipse commented 3 years ago

Update:

My app have just been through the peak session again, and the failed connect to pusher problem happened 12 times in total. Due to my observation on the app, this time I confirmed that it did update on the clients after some delays (even it became failed job afterwards), means the failed jobs can be ignored. By the way, I have also logged the response status code in PusherBroadcaster.php, and the status was 0 for those failed jobs.

Here are the logged items (While all message were sent successfully):

[2020-12-02 19:54:46][20274] Processing: App\Events\UpdateResult [2020-12-02 19:55:12][20274] Processed: App\Events\UpdateResult [2020-12-02 19:55:18][20275] Processing: App\Events\UpdateResult [2020-12-02 19:55:48][20275] Failed: App\Events\UpdateResult [2020-12-02 19:55:56][20276] Processing: App\Events\UpdateResult [2020-12-02 19:56:26][20276] Failed: App\Events\UpdateResult [2020-12-02 19:56:40][20277] Processing: App\Events\UpdateResult [2020-12-02 19:57:10][20277] Failed: App\Events\UpdateResult [2020-12-02 19:57:14][20278] Processing: App\Events\UpdateResult [2020-12-02 19:57:44][20278] Failed: App\Events\UpdateResult [2020-12-02 19:57:54][20279] Processing: App\Events\UpdateResult [2020-12-02 19:58:23][20279] Processed: App\Events\UpdateResult [2020-12-02 19:58:27][20280] Processing: App\Events\UpdateResult [2020-12-02 19:58:57][20280] Failed: App\Events\UpdateResult [2020-12-02 19:59:08][20281] Processing: App\Events\UpdateResult [2020-12-02 19:59:38][20281] Failed: App\Events\UpdateResult [2020-12-02 19:59:38][20282] Processing: App\Events\UpdateResult [2020-12-02 20:00:04][20283] Processing: App\Events\UpdateResult [2020-12-02 20:00:08][20282] Failed: App\Events\UpdateResult [2020-12-02 20:00:20][20284] Processing: App\Events\UpdateResult [2020-12-02 20:00:34][20283] Failed: App\Events\UpdateResult [2020-12-02 20:00:50][20284] Failed: App\Events\UpdateResult [2020-12-02 20:04:09][20285] Processing: App\Events\UpdateResult [2020-12-02 20:04:27][20285] Processed: App\Events\UpdateResult [2020-12-02 20:04:45][20286] Processing: App\Events\UpdateResult [2020-12-02 20:04:59][20287] Processing: App\Events\UpdateResult [2020-12-02 20:05:06][20288] Processing: App\Events\UpdateResult [2020-12-02 20:05:11][20286] Processed: App\Events\UpdateResult [2020-12-02 20:05:15][20289] Processing: App\Events\UpdateResult [2020-12-02 20:05:24][20287] Processed: App\Events\UpdateResult [2020-12-02 20:05:36][20288] Failed: App\Events\UpdateResult [2020-12-02 20:05:45][20289] Failed: App\Events\UpdateResult

Thank you!

rennokki commented 3 years ago

@kelvin195eclipse If you run with Supervisor, please add a stdout_logfile as shown here: https://laravel.com/docs/8.x/queues#configuring-supervisor and after the peak hour passes, check if any errors appear. To me, it seems like the websockets app crashes after a heavy load and since you got logging with Horizon, it's still nothing relevant. We should know what Websockets actually does and if it crashes.