glideinWMS / glideinwms

The glideinWMS Project
http://tinyurl.com/glideinwms
Apache License 2.0
16 stars 46 forks source link

Limits not respected when authentication type is scitoken #306

Closed mmascher closed 1 year ago

mmascher commented 1 year ago

Describe the bug The production CERN factory crashed when the entry type was changed to auth_type="scitoken". The reason for that is that the factory does not see the idle pilots and keep submitting until the machine runs out of memory.

To Reproduce Enable auth_type="scitoken" on one of the entry and then submit using a frontend that has both proxy and scitoken. You should see the schedd status becoming all 0 in the entry logs after the change:

Before the change:

[2023-06-13 17:00:30,179] INFO: Client CMSG-v1_0.main-arc (secid: CMSG-v1_0_cmspilot) requesting 100 glideins, max running 1713, idle lifetime 82800, remove excess 'IDLE', remove_excess_margin 5
[2023-06-13 17:00:30,182] INFO: Client CMSG-v1_0.main-arc (secid: CMSG-v1_0_cmspilot) schedd status {2: 291, 1002: 100, 1: 100}

and after the change:

[2023-06-13 17:41:54,831] INFO: Client CMSG-v1_0.main-arc (secid: CMSG-v1_0_cmspilot) requesting 100 glideins, max running 1703, idle lifetime 82800, remove excess 'IDLE', remove_excess_margin 5
[2023-06-13 17:41:54,838] INFO: Client CMSG-v1_0.main-arc (secid: CMSG-v1_0_cmspilot) schedd status {1: 0}

Expected behavior The schedd information should be detected and the frontend limit of 100 should be applied

mmascher commented 1 year ago

I did some initial investigation but I could not figure out the root cause. I might need more time to debug, possibly tomorrow morning. I want to check if something changes when these constraint are applied here:

https://github.com/glideinWMS/glideinwms/blob/master/factory/glideFactoryLib.py#L278-L285

The function is called here:

https://github.com/glideinWMS/glideinwms/blob/master/factory/glideFactoryEntry.py#L1686

I plan to put a breakpoint before and after the call and check if the idle pilots gets filtered out (maybe the credential id of the proxy is used to submit and the filter is done based on the credential id of the scitoken?).

mambelli commented 1 year ago

@mmascher I see you use v3.9.2. Did you check if PR #242 is fixing this? In a different context, but the symptoms are similar. It was merged in 3.10.0

mmascher commented 1 year ago

I do confirm that fixed the issue. Thanks @mambelli !