Closed mmascher closed 1 year ago
I did some initial investigation but I could not figure out the root cause. I might need more time to debug, possibly tomorrow morning. I want to check if something changes when these constraint are applied here:
https://github.com/glideinWMS/glideinwms/blob/master/factory/glideFactoryLib.py#L278-L285
The function is called here:
https://github.com/glideinWMS/glideinwms/blob/master/factory/glideFactoryEntry.py#L1686
I plan to put a breakpoint before and after the call and check if the idle pilots gets filtered out (maybe the credential id of the proxy is used to submit and the filter is done based on the credential id of the scitoken?).
@mmascher I see you use v3.9.2. Did you check if PR #242 is fixing this? In a different context, but the symptoms are similar. It was merged in 3.10.0
I do confirm that fixed the issue. Thanks @mambelli !
Describe the bug The production CERN factory crashed when the entry type was changed to
auth_type="scitoken"
. The reason for that is that the factory does not see the idle pilots and keep submitting until the machine runs out of memory.To Reproduce Enable
auth_type="scitoken"
on one of the entry and then submit using a frontend that has both proxy and scitoken. You should see the schedd status becoming all 0 in the entry logs after the change:Before the change:
and after the change:
Expected behavior The schedd information should be detected and the frontend limit of 100 should be applied