Closed JohnXLivingston closed 7 months ago
Again this night. At 2:22:09, just after a redundancy job. I'll deactivate redundancy, and check if it happens again.
It did not happen again since I removed the redundancy.
I checked postgresql logs. There were errors, but not at time of crashes:
2023-01-25 09:40:51.629 CET [23961] peertube@peertube_prod ERREUR: n'a pas pu sérialiser un accès à cause d'une mise à jour en parallèle
2023-01-25 09:40:51.629 CET [23961] peertube@peertube_prod CONTEXTE : instruction SQL « SELECT 1 FROM ONLY "public"."videoFile" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x »
2023-01-25 09:40:51.629 CET [23961] peertube@peertube_prod INSTRUCTION : INSERT INTO "videoRedundancy" ("id","expiresOn","fileUrl","url","strategy","videoFileId","actorId","createdAt","updatedAt") VALUES (DEFAULT,$1,$2,$3,$4,$5,$6,$7,$8) RETURNING "id","expiresOn","fileUrl","url","strategy","videoFileId","videoStreamingPlaylistId","actorId","createdAt","updatedAt";
2023-01-27 15:20:45.505 CET [75996] peertube@peertube_prod ERREUR: n'a pas pu sérialiser un accès à cause d'une mise à jour en parallèle
2023-01-27 15:20:45.505 CET [75996] peertube@peertube_prod CONTEXTE : instruction SQL « SELECT 1 FROM ONLY "public"."videoFile" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x »
2023-01-27 15:20:45.505 CET [75996] peertube@peertube_prod INSTRUCTION : INSERT INTO "videoRedundancy" ("id","expiresOn","fileUrl","url","strategy","videoFileId","actorId","createdAt","updatedAt") VALUES (DEFAULT,$1,$2,$3,$4,$5,$6,$7,$8) RETURNING "id","expiresOn","fileUrl","url","strategy","videoFileId","videoStreamingPlaylistId","actorId","createdAt","updatedAt";
(crashes were at Jan 21 & 28)
I have no clue of what happened. Redundancy was working fine for months.
Can you enable opentelemetry and display the "libuv active requests" graph when you server crashes?
Also SequelizeConnectionAcquireTimeoutError
errors with the activeRequests
property in the logs may help me (this property is not always logged depending where it was raised)
Can you enable opentelemetry and display the "libuv active requests" graph when you server crashes?
For now, I have no opentelemetry server available to monitor this instance. Can i just configure the exporters, and open their endpoints the next time the server crashes?
Also
SequelizeConnectionAcquireTimeoutError
errors with theactiveRequests
property in the logs may help me (this property is not always logged depending where it was raised)
Not sure to understand. What can I do to help? rgrep activeRequests /var/www/peertube/storage/logs
does not return any result.
Did you have the problem again without redundancy?
Did you have the problem again without redundancy?
No. The issue never happened again. I will enable redundancy again, and let you know if it starts again.
And the server crashed again... :(
Can you provide again server logs?
Can you provide again server logs?
Logs contains dump of RSA Signatures. I don't know if these information are private. Are you ok if I send you the logs by encrypted mail?
sure
Mail sent.
Just thinking of something. When i first got this error, i read some web pages that said that using promises with Sequelize could cause this kind of issue if exception are not catched properly.
Today i'm chatting with someone that got timeout errors during redundancy process.
Could it be a timeout triggered on the got
library that is not properly handled? (i see in the code that there is a default 30 second timeout for file download).
Edit: it seems there is a specific 3 hour timeout for redundancy (not 30 seconds). But, i'm not sure, i think this timeout only applies for HLS videos. So if the video is not using hls, maybe there is a 30 seconds timeout, that is too short for long files ?
I'm closing this issue. I enabled again the redundancy, and this time there was no crash.
Describe the current behavior
Hello,
One of my Peertube server crashed 2 times (with 6 days between the two crashes).
In logs, I can see these kind of logs:
All postgresql request are triggering timeouts.
systemctl status postgresql
says everything is ok. Restarting the postgresql server don't change anything. Peertube only comes back when I restart Peertube (so it seems the issue is on the Peertube side).I can't see any special error log.
The log just before the first timeouts are:
And for the second crash:
I don't know if it is relevant, but it seems to happen when dealing with redundancy.
This is a standard Peertube installation, on Debian bullseye. Last Peertube version (5.0.1). It was working for a very long time, without any issue. I did not change anything on the server recently (except installing debian security updates)
Steps to reproduce
No idea.
Describe the expected behavior
No response
Additional information