Closed baszoetekouw closed 1 month ago
Apply the same solution for non-database related Cron jobs as for the Cron jobs that do use the database: apply a locking solution in order for only thread to run the actual job. The thread that is granted the atomic lock for refreshing the metadata will write the results to a file and the threads that are not granted the lock do nothing, but will reset the last modified timestamp of the metadata file. Read the file lazily on demand if necessary, e.g. the metadata is not loaded yet.
May 22 04:00:00 app1-tf1 gunicorn[428500]: INFO [apscheduler.executors.default] Running job "parse_idp_metadata (trigger: cron[day='*', hour='4'], next run at: 2024-05-22 04:00:00 UTC)" (scheduled at 2024-05-22 04:00:00+00:00)
May 22 04:00:00 app1-tf1 gunicorn[428500]: INFO [scheduler] Resetting idp_metadata as no lock could be obtained
May 22 04:00:00 app1-tf1 gunicorn[428500]: INFO [apscheduler.executors.default] Job "parse_idp_metadata (trigger: cron[day='*', hour='4'], next run at: 2024-05-22 04:00:00 UTC)" executed successfully
May 22 04:00:00 app1-tf1 gunicorn[428502]: INFO:apscheduler.executors.default:Running job "parse_idp_metadata (trigger: cron[day='*', hour='4'], next run at: 2024-05-23 04:00:00 UTC)" (scheduled at 2024-05-22 04:00:00+00:00)
May 22 04:00:00 app1-tf1 gunicorn[428502]: INFO:scheduler:Resetting idp_metadata as no lock could be obtained
May 22 04:00:00 app1-tf1 gunicorn[428502]: INFO:apscheduler.executors.default:Job "parse_idp_metadata (trigger: cron[day='*', hour='4'], next run at: 2024-05-23 04:00:00 UTC)" executed successfully
May 22 04:00:00 app1-tf1 gunicorn[428504]: INFO:apscheduler.executors.default:Running job "parse_idp_metadata (trigger: cron[day='*', hour='4'], next run at: 2024-05-22 04:00:00 UTC)" (scheduled at 2024-05-22 04:00:00+00:00)
May 22 04:00:00 app1-tf1 gunicorn[428504]: INFO:scheduler:Resetting idp_metadata as no lock could be obtained
May 22 04:00:00 app1-tf1 gunicorn[428504]: INFO:apscheduler.executors.default:Job "parse_idp_metadata (trigger: cron[day='*', hour='4'], next run at: 2024-05-22 04:00:00 UTC)" executed successfully
May 22 04:00:00 app1-tf1 gunicorn[428503]: INFO [apscheduler.executors.default] Running job "parse_idp_metadata (trigger: cron[day='*', hour='4'], next run at: 2024-05-23 04:00:00 UTC)" (scheduled at 2024-05-22 04:00:00+00:00)
May 22 04:00:00 app1-tf1 gunicorn[428503]: INFO [scheduler] Start running parse_idp_metadata job
May 22 04:00:03 app1-tf1 gunicorn[428503]: INFO [scheduler] Finished running parse_idp_metadata job in 3162 ms
May 22 04:00:03 app1-tf1 gunicorn[428503]: INFO [scheduler] Writing idp_metadata to /tmp/idp_metadata.json
May 22 04:00:03 app1-tf1 gunicorn[428503]: INFO [apscheduler.executors.default] Job "parse_idp_metadata (trigger: cron[day='*', hour='4'], next run at: 2024-05-23 04:00:00 UTC)" executed successfully
Go multiple times to the organisation admin page in organisation detail for an organisation which has units and see the details of a ORG manager (this triggers a call to https://test.sram.surf.nl/api/organisations/identity_provider_display_name?lang=en&user_id=24052 ). Then check if the idp_metadata is read from the /tmp
cache
May 22 14:18:01 app1-tf1 gunicorn[428502]: INFO:scheduler:Reading idp_metadata from /tmp/idp_metadata.json
May 22 14:18:29 app1-tf1 gunicorn[428504]: INFO:scheduler:Reading idp_metadata from /tmp/idp_metadata.json
May 22 14:18:52 app1-tf1 gunicorn[428500]: INFO [scheduler] Reading idp_metadata from /tmp/idp_metadata.json
OK
The problem seem to be that all (8) gunicorn threads are refreshing the metadata at the same time. During the refresh, a thread cannot handle other requests, so SBS goes down.
Could we change these schedules processed such that they are only run on one of the threads? Because also for the other schedules jobs, (suspension etc) it doesn't make sense to run them 8 times in parallel.
And specifically for the
parse_idp_metadata cronjob
, would it be possible to run this once and then share the data with the other threads? Or maybe replace it by a simple shell/xslt script that outputs a simple json file and have SBS read that? That would also improve startup time.