Server get stalled: SequelizeConnectionAcquireTimeoutError

JohnXLivingston commented 1 year ago

Describe the current behavior

Hello,

One of my Peertube server crashed 2 times (with 6 days between the two crashes).

In logs, I can see these kind of logs:

Jan 21 23:35:27 server0 peertube[433710]: [videos.john-livingston.fr:443] 2023-01-21 23:35:27.949 #033[31merror#033[39m: Cannot execute UpdateVideosScheduler scheduler. {
Jan 21 23:35:28 server0 peertube[433710]:   "err": {
Jan 21 23:35:28 server0 peertube[433710]:     "stack": "SequelizeConnectionAcquireTimeoutError: Operation timeout\n    at ConnectionManager.getConnection (/var/www/peertube/versions/pee
rtube-v5.0.1/node_modules/sequelize/lib/dialects/abstract/connection-manager.js:204:15)\n    at async /var/www/peertube/versions/peertube-v5.0.1/node_modules/sequelize/lib/sequelize.js:
304:26\n    at async PostgresQueryInterface.select (/var/www/peertube/versions/peertube-v5.0.1/node_modules/sequelize/lib/dialects/abstract/query-interface.js:407:12)\n    at async Func
tion.findAll (/var/www/peertube/versions/peertube-v5.0.1/node_modules/sequelize/lib/model.js:1134:21)\n    at async Function.findOne (/var/www/peertube/versions/peertube-v5.0.1/node_mod
ules/sequelize/lib/model.js:1228:12)",
Jan 21 23:35:28 server0 peertube[433710]:     "message": "Operation timeout",
Jan 21 23:35:28 server0 peertube[433710]:     "name": "SequelizeConnectionAcquireTimeoutError",
Jan 21 23:35:28 server0 peertube[433710]:     "parent": {
Jan 21 23:35:28 server0 peertube[433710]:       "stack": "Error: Operation timeout\n    at Timeout.<anonymous> (/var/www/peertube/versions/peertube-v5.0.1/node_modules/sequelize-pool/lib/Deferred.js:17:25)\n    at listOnTimeout (internal/timers.js:557:17)\n    at processTimers (internal/timers.js:500:7)",
Jan 21 23:35:28 server0 peertube[433710]:       "message": "Operation timeout"
Jan 21 23:35:28 server0 peertube[433710]:     },
Jan 21 23:35:28 server0 peertube[433710]:     "original": {
Jan 21 23:35:28 server0 peertube[433710]:       "stack": "Error: Operation timeout\n    at Timeout.<anonymous> (/var/www/peertube/versions/peertube-v5.0.1/node_modules/sequelize-pool/lib/Deferred.js:17:25)\n    at listOnTimeout (internal/timers.js:557:17)\n    at processTimers (internal/timers.js:500:7)",
Jan 21 23:35:28 server0 peertube[433710]:       "message": "Operation timeout"
Jan 21 23:35:28 server0 peertube[433710]:     }
Jan 21 23:35:28 server0 peertube[433710]:   }
Jan 21 23:35:28 server0 peertube[433710]: }

All postgresql request are triggering timeouts.

systemctl status postgresql says everything is ok. Restarting the postgresql server don't change anything. Peertube only comes back when I restart Peertube (so it seems the issue is on the Peertube side).

I can't see any special error log.

The log just before the first timeouts are:

Jan 21 23:32:25 server0 peertube[433710]: [videos.john-livingston.fr:443] 2023-01-21 23:32:25.082 #033[32minfo#033[39m: Importing torrent video https://framatube.org/lazy-static/torrent
s/d6f5c3ee-0180-4390-8e8c-a061c270d99f-480.torrent
Jan 21 23:32:25 server0 peertube[433710]: [videos.john-livingston.fr:443] 2023-01-21 23:32:25.145 #033[32minfo#033[39m: Updating 1 good actor follows and 0 bad actor follows scores in c
ache. {
Jan 21 23:32:25 server0 peertube[433710]:   "badInboxes": []
Jan 21 23:32:25 server0 peertube[433710]: }
Jan 21 23:32:25 server0 peertube[433710]: [videos.john-livingston.fr:443] 2023-01-21 23:32:25.945 #033[32minfo#033[39m: Cleaning video viewers.
Jan 21 23:32:42 server0 peertube[433710]: [videos.john-livingston.fr:443] 2023-01-21 23:32:42.832 #033[32minfo#033[39m: Receiving inbox requests for 1 activities by https://peertube-bla
blalinux.be/accounts/peertube.
Jan 21 23:32:42 server0 peertube[433710]: [videos.john-livingston.fr:443] 2023-01-21 23:32:42.835 #033[32minfo#033[39m: 162.19.74.111 - - [21/Jan/2023:22:32:42 +0000] "POST /inbox HTTP/
1.0" 204 - "-" "PeerTube/5.0.1 (+https://peertube-blablalinux.be)"
Jan 21 23:33:06 server0 peertube[433710]: [videos.john-livingston.fr:443] 2023-01-21 23:33:06.731 #033[32minfo#033[39m: Creating job to send file cache of https://videos.john-livingston
.fr/redundancy/videos/d6f5c3ee-0180-4390-8e8c-a061c270d99f/480.
Jan 21 23:33:06 server0 peertube[433710]: [videos.john-livingston.fr:443] 2023-01-21 23:33:06.737 #033[32minfo#033[39m: Duplicated https://framatube.org/videos/watch/d6f5c3ee-0180-4390-
8e8c-a061c270d99f - 480 -> https://videos.john-livingston.fr/redundancy/videos/d6f5c3ee-0180-4390-8e8c-a061c270d99f/480.
Jan 21 23:33:06 server0 peertube[433710]: [videos.john-livingston.fr:443] 2023-01-21 23:33:06.747 #033[32minfo#033[39m: Processing ActivityPub unicast in job 63637.
Jan 21 23:33:06 server0 peertube[433710]: [videos.john-livingston.fr:443] 2023-01-21 23:33:06.787 #033[32minfo#033[39m: Duplicating https://framatube.org/videos/watch/d6f5c3ee-0180-4390
-8e8c-a061c270d99f - 720 in videos redundancy with "trending" strategy.
Jan 21 23:33:06 server0 peertube[433710]: [videos.john-livingston.fr:443] 2023-01-21 23:33:06.787 #033[32minfo#033[39m: Importing torrent video https://framatube.org/lazy-static/torrent
s/d6f5c3ee-0180-4390-8e8c-a061c270d99f-720.torrent
Jan 21 23:33:06 server0 peertube[433710]: [videos.john-livingston.fr:443] 2023-01-21 23:33:06.849 #033[32minfo#033[39m: Updating 1 good actor follows and 0 bad actor follows scores in c
ache. {
Jan 21 23:33:06 server0 peertube[433710]:   "badInboxes": []
Jan 21 23:33:06 server0 peertube[433710]: }
Jan 21 23:33:58 server0 peertube[433710]: [videos.john-livingston.fr:443] 2023-01-21 23:33:58.199 #033[32minfo#033[39m: Receiving inbox requests for 1 activities by https://peertube-bla
blalinux.be/accounts/peertube.
Jan 21 23:33:58 server0 peertube[433710]: [videos.john-livingston.fr:443] 2023-01-21 23:33:58.207 #033[32minfo#033[39m: 162.19.74.111 - - [21/Jan/2023:22:33:58 +0000] "POST /inbox HTTP/
1.0" 204 - "-" "PeerTube/5.0.1 (+https://peertube-blablalinux.be)"
Jan 21 23:34:25 server0 peertube[433710]: [videos.john-livingston.fr:443] 2023-01-21 23:34:25.945 #033[32minfo#033[39m: Cleaning video viewers.

And for the second crash:

Jan 28 23:51:46 server0 peertube[2205]: [videos.john-livingston.fr:443] 2023-01-28 23:51:46.597 #033[32minfo#033[39m: Receiving inbox requests for 1 activities by https://mstdn.social/users/lupolone.
Jan 28 23:51:46 server0 peertube[2205]: [videos.john-livingston.fr:443] 2023-01-28 23:51:46.604 #033[32minfo#033[39m: 23.88.106.117 - - [28/Jan/2023:22:51:46 +0000] "POST /inbox HTTP/1.0" 204 - "-" "http.rb/5.1.0 (Mastodon/4.0.2; +https://mstdn.social/)"
Jan 28 23:51:46 server0 peertube[2205]: [videos.john-livingston.fr:443] 2023-01-28 23:51:46.654 #033[32minfo#033[39m: Remote account https://mstdn.social/users/lupolone removed.
Jan 28 23:51:47 server0 peertube[2205]: [videos.john-livingston.fr:443] 2023-01-28 23:51:47.088 #033[32minfo#033[39m: Creating job to send file cache of https://videos.john-livingston.fr/redundancy/videos/9db9f3f1-9b54-44ed-9e91-461d262d2205/360.
Jan 28 23:51:47 server0 peertube[2205]: [videos.john-livingston.fr:443] 2023-01-28 23:51:47.090 #033[32minfo#033[39m: Duplicated https://framatube.org/videos/watch/9db9f3f1-9b54-44ed-9e91-461d262d2205 - 360 -> https://videos.john-livingston.fr/redundancy/videos/9db9f3f1-9b54-44ed-9e91-461d262d2205/360.
Jan 28 23:51:47 server0 peertube[2205]: [videos.john-livingston.fr:443] 2023-01-28 23:51:47.098 #033[32minfo#033[39m: Processing ActivityPub unicast in job 64150.
Jan 28 23:51:47 server0 peertube[2205]: [videos.john-livingston.fr:443] 2023-01-28 23:51:47.119 #033[32minfo#033[39m: Duplicating https://framatube.org/videos/watch/9db9f3f1-9b54-44ed-9e91-461d262d2205 - 240 in videos redundancy with "most-views" strategy.
Jan 28 23:51:47 server0 peertube[2205]: [videos.john-livingston.fr:443] 2023-01-28 23:51:47.120 #033[32minfo#033[39m: Importing torrent video https://framatube.org/lazy-static/torrents/9db9f3f1-9b54-44ed-9e91-461d262d2205-240.torrent
Jan 28 23:51:47 server0 peertube[2205]: [videos.john-livingston.fr:443] 2023-01-28 23:51:47.189 #033[32minfo#033[39m: Updating 1 good actor follows and 0 bad actor follows scores in cache. {
Jan 28 23:51:47 server0 peertube[2205]:   "badInboxes": []
Jan 28 23:51:47 server0 peertube[2205]: }
Jan 28 23:51:57 server0 peertube[2205]: [videos.john-livingston.fr:443] 2023-01-28 23:51:57.353 #033[32minfo#033[39m: Cleaning video viewers.

I don't know if it is relevant, but it seems to happen when dealing with redundancy.

This is a standard Peertube installation, on Debian bullseye. Last Peertube version (5.0.1). It was working for a very long time, without any issue. I did not change anything on the server recently (except installing debian security updates)

Steps to reproduce

No idea.

Describe the expected behavior

No response

Additional information

PeerTube instance:
- Version: 5.0.1
- NodeJS version: v14.21.2

JohnXLivingston commented 1 year ago

Again this night. At 2:22:09, just after a redundancy job. I'll deactivate redundancy, and check if it happens again.

JohnXLivingston commented 1 year ago

It did not happen again since I removed the redundancy.

I checked postgresql logs. There were errors, but not at time of crashes:

2023-01-25 09:40:51.629 CET [23961] peertube@peertube_prod ERREUR:  n'a pas pu sérialiser un accès à cause d'une mise à jour en parallèle
2023-01-25 09:40:51.629 CET [23961] peertube@peertube_prod CONTEXTE :  instruction SQL « SELECT 1 FROM ONLY "public"."videoFile" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x »
2023-01-25 09:40:51.629 CET [23961] peertube@peertube_prod INSTRUCTION :  INSERT INTO "videoRedundancy" ("id","expiresOn","fileUrl","url","strategy","videoFileId","actorId","createdAt","updatedAt") VALUES (DEFAULT,$1,$2,$3,$4,$5,$6,$7,$8) RETURNING "id","expiresOn","fileUrl","url","strategy","videoFileId","videoStreamingPlaylistId","actorId","createdAt","updatedAt";
2023-01-27 15:20:45.505 CET [75996] peertube@peertube_prod ERREUR:  n'a pas pu sérialiser un accès à cause d'une mise à jour en parallèle
2023-01-27 15:20:45.505 CET [75996] peertube@peertube_prod CONTEXTE :  instruction SQL « SELECT 1 FROM ONLY "public"."videoFile" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x »
2023-01-27 15:20:45.505 CET [75996] peertube@peertube_prod INSTRUCTION :  INSERT INTO "videoRedundancy" ("id","expiresOn","fileUrl","url","strategy","videoFileId","actorId","createdAt","updatedAt") VALUES (DEFAULT,$1,$2,$3,$4,$5,$6,$7,$8) RETURNING "id","expiresOn","fileUrl","url","strategy","videoFileId","videoStreamingPlaylistId","actorId","createdAt","updatedAt";

(crashes were at Jan 21 & 28)

I have no clue of what happened. Redundancy was working fine for months.

Chocobozzz commented 1 year ago

Can you enable opentelemetry and display the "libuv active requests" graph when you server crashes?

Also SequelizeConnectionAcquireTimeoutError errors with the activeRequests property in the logs may help me (this property is not always logged depending where it was raised)

JohnXLivingston commented 1 year ago

Can you enable opentelemetry and display the "libuv active requests" graph when you server crashes?

For now, I have no opentelemetry server available to monitor this instance. Can i just configure the exporters, and open their endpoints the next time the server crashes?

Also SequelizeConnectionAcquireTimeoutError errors with the activeRequests property in the logs may help me (this property is not always logged depending where it was raised)

Not sure to understand. What can I do to help? rgrep activeRequests /var/www/peertube/storage/logs does not return any result.

Chocobozzz commented 1 year ago

Did you have the problem again without redundancy?

JohnXLivingston commented 1 year ago

Did you have the problem again without redundancy?

No. The issue never happened again. I will enable redundancy again, and let you know if it starts again.

JohnXLivingston commented 1 year ago

And the server crashed again... :(

Chocobozzz commented 1 year ago

Can you provide again server logs?

JohnXLivingston commented 1 year ago

Can you provide again server logs?

Logs contains dump of RSA Signatures. I don't know if these information are private. Are you ok if I send you the logs by encrypted mail?

Chocobozzz commented 1 year ago

sure

JohnXLivingston commented 1 year ago

Mail sent.

JohnXLivingston commented 1 year ago

Just thinking of something. When i first got this error, i read some web pages that said that using promises with Sequelize could cause this kind of issue if exception are not catched properly.

Today i'm chatting with someone that got timeout errors during redundancy process.

Could it be a timeout triggered on the got library that is not properly handled? (i see in the code that there is a default 30 second timeout for file download).

Edit: it seems there is a specific 3 hour timeout for redundancy (not 30 seconds). But, i'm not sure, i think this timeout only applies for HLS videos. So if the video is not using hls, maybe there is a 30 seconds timeout, that is too short for long files ?

JohnXLivingston commented 7 months ago

I'm closing this issue. I enabled again the redundancy, and this time there was no crash.

Chocobozzz / PeerTube