ElvishArtisan / rivendell

A full-featured radio automation system targeted for use in professional broadcast and media environments
208 stars 64 forks source link

rdlogmanager times out on ripcd connection while generating logs #987

Open fpeyraud opened 3 weeks ago

fpeyraud commented 3 weeks ago

Hi, I've noticed on Rivendell 4.3.0int4 on ubuntu 24.04 and Linux Mint 22 that rdlogmanager often gets connection timeout to ripcd whie generating logs. At best, it slows down the log generation quite a lot. At worst log generation fails right in the middle, which causes really bad effect on playout when nobody checked that the log looks correct before getting on air.

The logs look like this :

2024-10-06T21:01:37.082834+02:00 rdactive rdlogmanager: connection to ripcd(8) timed out, attempting reconnect
2024-10-06T21:01:37.083081+02:00 rdactive rd-generate-log-err[241290]: rdlogmanager: connection to ripcd(8) restored
2024-10-06T21:01:37.083148+02:00 rdactive rd-generate-log-err[241290]: rdlogmanager: connection to ripcd(8) timed out, attempting reconnect
2024-10-06T21:01:37.083420+02:00 rdactive ripcd: added new connection 62
2024-10-06T21:01:50.633016+02:00 rdactive rdlogmanager: connection to ripcd(8) restored 
2024-10-06T21:01:50.633500+02:00 rdactive rd-generate-log-err[241290]: rdlogmanager: connection to ripcd(8) restored
2024-10-06T21:01:50.633749+02:00 rdactive ripcd: closed connection 47
2024-10-06T21:01:50.634027+02:00 rdactive ripcd: cleaned up connection 47
2024-10-06T21:03:54.838561+02:00 rdactive rdlogmanager: connection to ripcd(8) timed out, attempting reconnect
2024-10-06T21:03:54.839015+02:00 rdactive rd-generate-log-err[241290]: rdlogmanager: connection to ripcd(8) timed out, attempting reconnect
2024-10-06T21:04:41.126508+02:00 rdactive ripcd: closed connection 62
2024-10-06T21:04:41.126946+02:00 rdactive ripcd: added new connection 47
2024-10-06T21:04:41.127475+02:00 rdactive ripcd: cleaned up connection 62
2024-10-06T21:04:45.807312+02:00 rdactive rdlogmanager: connection to ripcd(8) restored
2024-10-06T21:04:45.807548+02:00 rdactive rd-generate-log-err[241290]: rdlogmanager: connection to ripcd(8) restored 

the rd-generate-log is a wrapper script which avoids concurrent log generation. The log shows its stderr output.

The issue has been seen on various hardware and distributions Has anyone noticed the same ?

romainmp commented 3 weeks ago

Same here on Mint 22 with 4.3.0 from Paravel packages (dated 2024-06-26).

Log generation took almost an hour, and during this time, rdlogmanager and ripcd kept playing Hide & Seek :

2024-10-08T09:17:07.295007+02:00 fondcombe rdlogmanager: connection to ripcd(8) timed out, attempting reconnect
2024-10-08T09:21:12.557218+02:00 fondcombe rdlogmanager: connection to ripcd(8) restored
2024-10-08T09:21:12.557477+02:00 fondcombe rdlogmanager: connection to ripcd(8) timed out, attempting reconnect
2024-10-08T09:25:15.817789+02:00 fondcombe rdlogmanager: connection to ripcd(8) restored
2024-10-08T09:25:15.823207+02:00 fondcombe rdlogmanager: connection to ripcd(8) timed out, attempting reconnect
2024-10-08T09:28:47.297880+02:00 fondcombe rdlogmanager: connection to ripcd(8) restored
2024-10-08T09:28:47.302967+02:00 fondcombe rdlogmanager: connection to ripcd(8) timed out, attempting reconnect
2024-10-08T09:32:22.981067+02:00 fondcombe rdlogmanager: connection to ripcd(8) restored
2024-10-08T09:32:22.981727+02:00 fondcombe rdlogmanager: connection to ripcd(8) timed out, attempting reconnect
2024-10-08T09:35:34.637006+02:00 fondcombe rdlogmanager: connection to ripcd(8) restored
2024-10-08T09:37:54.868641+02:00 fondcombe rdlogmanager: connection to ripcd(8) timed out, attempting reconnect
2024-10-08T09:42:03.095001+02:00 fondcombe rdlogmanager: connection to ripcd(8) restored
2024-10-08T09:42:03.103375+02:00 fondcombe rdlogmanager: connection to ripcd(8) timed out, attempting reconnect
2024-10-08T09:46:18.062151+02:00 fondcombe rdlogmanager: connection to ripcd(8) restored
2024-10-08T09:46:18.062457+02:00 fondcombe rdlogmanager: connection to ripcd(8) timed out, attempting reconnect
2024-10-08T09:50:31.615559+02:00 fondcombe rdlogmanager: connection to ripcd(8) restored
2024-10-08T09:50:31.615935+02:00 fondcombe rdlogmanager: connection to ripcd(8) timed out, attempting reconnect
2024-10-08T09:54:53.813713+02:00 fondcombe rdlogmanager: connection to ripcd(8) restored
2024-10-08T09:54:53.813986+02:00 fondcombe rdlogmanager: connection to ripcd(8) timed out, attempting reconnect
2024-10-08T09:58:38.121260+02:00 fondcombe rdlogmanager: connection to ripcd(8) restored
2024-10-08T09:58:38.129697+02:00 fondcombe rdlogmanager: connection to ripcd(8) timed out, attempting reconnect
2024-10-08T10:00:55.597774+02:00 fondcombe rdlogmanager: connection to ripcd(8) restored
2024-10-08T10:01:05.080475+02:00 fondcombe rdlogmanager: connection to ripcd(8) timed out, attempting reconnect
2024-10-08T10:01:05.081387+02:00 fondcombe rdlogmanager: connection to ripcd(8) restored
alastairtech commented 3 weeks ago

I have been seeing this issue intermittently on both ARM and AMD64 hardware as well when generating logs. I've recently moved my database to a new host/hardware. Not sure that helped. Happy to contribute logs and data as needed.

alastairtech commented 3 days ago

I also want to add to this database error, which may or may not be related. We normally generate logs in week blocks and can be guaranteed a couple of days will be incomplete with this error.

QSqlQuery::value: not positioned on a valid record for query: select `CREATE_LOG_PRIV` from `USERS` where `LOGIN_NAME`="programsHB-"