Describe the bug
The CERN producction factory crashed two times in the past week. It seems it was doing the rotation of the entry logfiles. No alarm was actually fired since the python processes keeps running.
To Reproduce
Hard. It just happens from time to time. Maybe increase the rotation frequency of the logs and observe?
Screenshots
[2023-09-27 14:49:59,951] DEBUG: cleanupSupport:37: Forked cleanup PIDS [123125, 123126, 123127, 123128]
[2023-09-27 14:56:55,733] DEBUG: glideFactoryEntryGroup:308: Setting parallel_workers limit of 8
[2023-09-27 15:00:56,094] WARNING: glideFactoryEntryGroup:415: Error occurred while trying to find and do work.
[2023-09-27 15:00:56,095] ERROR: glideFactoryEntryGroup:416: Exception:
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/glideinwms/factory/glideFactoryEntryGroup.py", line 412, in iterate_one
do_advertize, factory_in_downtime, glideinDescript, frontendDescript, group_name, my_entries
File "/usr/lib/python3.6/site-packages/glideinwms/factory/glideFactoryEntryGroup.py", line 344, in find_and_perform_work
logSupport.roll_all_logs()
File "/usr/lib/python3.6/site-packages/glideinwms/lib/logSupport.py", line 289, in roll_all_logs
handler.check_and_perform_rollover()
File "/usr/lib/python3.6/site-packages/glideinwms/lib/logSupport.py", line 283, in check_and_perform_rollover
if self.shouldRollover(None, empty_record=True):
File "/usr/lib/python3.6/site-packages/glideinwms/lib/logSupport.py", line 186, in shouldRollover
self.stream.seek(0, 2) # due to non-posix-compliant Windows feature
ValueError: I/O operation on closed file.
[2023-09-27 15:00:56,215] DEBUG: glideFactoryEntryGroup:418: Group Work done: {}
389 is making logging more robust and providing more troubleshooting info. This issue can be closed and a new one will be opened if this happens again.
Describe the bug The CERN producction factory crashed two times in the past week. It seems it was doing the rotation of the entry logfiles. No alarm was actually fired since the python processes keeps running.
To Reproduce Hard. It just happens from time to time. Maybe increase the rotation frequency of the logs and observe?
Screenshots
Info (please complete the following information):