SURFscz / SBS

Samenwerking Beheer Systeem ↣ Collaboration Management System
Apache License 2.0
3 stars 2 forks source link

SBS is down during parse_idp_metadata cronjob #1417

Closed baszoetekouw closed 1 month ago

baszoetekouw commented 1 month ago

The problem seem to be that all (8) gunicorn threads are refreshing the metadata at the same time. During the refresh, a thread cannot handle other requests, so SBS goes down.

Could we change these schedules processed such that they are only run on one of the threads? Because also for the other schedules jobs, (suspension etc) it doesn't make sense to run them 8 times in parallel.

And specifically for the parse_idp_metadata cronjob, would it be possible to run this once and then share the data with the other threads? Or maybe replace it by a simple shell/xslt script that outputs a simple json file and have SBS read that? That would also improve startup time.

oharsta commented 1 month ago

Apply the same solution for non-database related Cron jobs as for the Cron jobs that do use the database: apply a locking solution in order for only thread to run the actual job. The thread that is granted the atomic lock for refreshing the metadata will write the results to a file and the threads that are not granted the lock do nothing, but will reset the last modified timestamp of the metadata file. Read the file lazily on demand if necessary, e.g. the metadata is not loaded yet.

oharsta commented 1 month ago
May 22 04:00:00 app1-tf1 gunicorn[428500]: INFO  [apscheduler.executors.default] Running job "parse_idp_metadata (trigger: cron[day='*', hour='4'], next run at: 2024-05-22 04:00:00 UTC)" (scheduled at 2024-05-22 04:00:00+00:00)
May 22 04:00:00 app1-tf1 gunicorn[428500]: INFO  [scheduler] Resetting idp_metadata as no lock could be obtained
May 22 04:00:00 app1-tf1 gunicorn[428500]: INFO  [apscheduler.executors.default] Job "parse_idp_metadata (trigger: cron[day='*', hour='4'], next run at: 2024-05-22 04:00:00 UTC)" executed successfully

May 22 04:00:00 app1-tf1 gunicorn[428502]: INFO:apscheduler.executors.default:Running job "parse_idp_metadata (trigger: cron[day='*', hour='4'], next run at: 2024-05-23 04:00:00 UTC)" (scheduled at 2024-05-22 04:00:00+00:00)
May 22 04:00:00 app1-tf1 gunicorn[428502]: INFO:scheduler:Resetting idp_metadata as no lock could be obtained
May 22 04:00:00 app1-tf1 gunicorn[428502]: INFO:apscheduler.executors.default:Job "parse_idp_metadata (trigger: cron[day='*', hour='4'], next run at: 2024-05-23 04:00:00 UTC)" executed successfully

May 22 04:00:00 app1-tf1 gunicorn[428504]: INFO:apscheduler.executors.default:Running job "parse_idp_metadata (trigger: cron[day='*', hour='4'], next run at: 2024-05-22 04:00:00 UTC)" (scheduled at 2024-05-22 04:00:00+00:00)
May 22 04:00:00 app1-tf1 gunicorn[428504]: INFO:scheduler:Resetting idp_metadata as no lock could be obtained
May 22 04:00:00 app1-tf1 gunicorn[428504]: INFO:apscheduler.executors.default:Job "parse_idp_metadata (trigger: cron[day='*', hour='4'], next run at: 2024-05-22 04:00:00 UTC)" executed successfully

May 22 04:00:00 app1-tf1 gunicorn[428503]: INFO  [apscheduler.executors.default] Running job "parse_idp_metadata (trigger: cron[day='*', hour='4'], next run at: 2024-05-23 04:00:00 UTC)" (scheduled at 2024-05-22 04:00:00+00:00)
May 22 04:00:00 app1-tf1 gunicorn[428503]: INFO  [scheduler] Start running parse_idp_metadata job
May 22 04:00:03 app1-tf1 gunicorn[428503]: INFO  [scheduler] Finished running parse_idp_metadata job in 3162 ms
May 22 04:00:03 app1-tf1 gunicorn[428503]: INFO  [scheduler] Writing idp_metadata to /tmp/idp_metadata.json
May 22 04:00:03 app1-tf1 gunicorn[428503]: INFO  [apscheduler.executors.default] Job "parse_idp_metadata (trigger: cron[day='*', hour='4'], next run at: 2024-05-23 04:00:00 UTC)" executed successfully

Go multiple times to the organisation admin page in organisation detail for an organisation which has units and see the details of a ORG manager (this triggers a call to https://test.sram.surf.nl/api/organisations/identity_provider_display_name?lang=en&user_id=24052 ). Then check if the idp_metadata is read from the /tmp cache

May 22 14:18:01 app1-tf1 gunicorn[428502]: INFO:scheduler:Reading idp_metadata from /tmp/idp_metadata.json
May 22 14:18:29 app1-tf1 gunicorn[428504]: INFO:scheduler:Reading idp_metadata from /tmp/idp_metadata.json
May 22 14:18:52 app1-tf1 gunicorn[428500]: INFO  [scheduler] Reading idp_metadata from /tmp/idp_metadata.json
mrvanes commented 1 month ago

OK