Open geichelberger opened 5 months ago
Can you give more details? The worker should in fact not fail when Opencast is down. I regularly look at three long-running Tobira systems and there the worker never failed because of an unavailable Opencast. It just prints errors to the log but recovers automatically. So you have to give me more details to reproduce your error state. What Tobira version? What exactly are you doing?
Log:
Jun 04 06:42:47 oc-presentation-01.xyz systemd[1]: Started Tobira Worker.
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]: 2024-06-04 06:42:47.155 INFO tobira > Starting Tobira ~~ cli_args=["/opt/tobira/tobira", "worker", "-c", "/etc/tobira/config.toml"]
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]: 2024-06-04 06:42:47.155 INFO tobira > Loaded config ~~ source_file="/etc/tobira/config.toml"
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]: 2024-06-04 06:42:47.155 INFO tobira > Starting Tobira worker ...
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]: 2024-06-04 06:42:47.171 INFO tobira::db > Connected to DB! ~~ server_version="15.3" user="tobira" session_user="tobira" schema="tobira" database="tobira"
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]: 2024-06-04 06:42:47.176 INFO tobira::db::migrations > All migrations are already applied: database schema is up to date.
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]: 2024-06-04 06:42:47.246 INFO tobira::search > Connected to MeiliSearch at 'https://oc-index-02.xyz:7700'
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]: 2024-06-04 06:42:47.287 ERROR tobira > error synchronizing with Opencast
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]: >
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]: > Caused by:
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]: > 0: failed to fetch API version
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]: > 1: API returned unexpected HTTP code 503 Service Unavailable (for 'https://xyz/tobira/version', authenticating as 'admin')
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]: ▶▶▶ Error: error synchronizing with Opencast
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]: Caused by:
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]: ‣ failed to fetch API version
Jun 04 06:42:47 oc-presentation-01.xyz tobira[157784]: ‣ API returned unexpected HTTP code 503 Service Unavailable (for 'https://xyz/tobira/version', authenticating as 'admin')
Jun 04 06:42:47 oc-presentation-01.xyz systemd[1]: tobira-worker.service: Main process exited, code=exited, status=1/FAILURE
Jun 04 06:42:47 oc-presentation-01.xyz systemd[1]: tobira-worker.service: Failed with result 'exit-code'.
Jun 04 06:42:47 oc-presentation-01.xyz systemd[1]: tobira-worker.service: Scheduled restart job, restart counter is at 5.
Jun 04 06:42:47 oc-presentation-01.xyz systemd[1]: Stopped Tobira Worker.
Oh so you are saying the worker cannot be started while Opencast is down? But a running work does not go down with Opencast. Yes?
Sorry, I should have been a little bit more precise.
If Opencast becomes unreachable, the Tobira worker crashes and causes the systemd service to fail because of unsuccessful retry attempts. This circumstance can be caused by network outages or updates from Opencast.
The expected behavior would be for the worker not to exit, handle the error, and, importantly, continue running.