Open amaltaro opened 1 month ago
Relevant JIRA ticket: https://its.cern.ch/jira/browse/CMSVOC-598
Just a quick update, 6 out of 8 nodes are now set to 12GB of RAM. The other 2 nodes are currently in use and we cannot make this change until we can actually drain those agents/nodes. Further details in the ticket above.
Impact of the new feature WMAgent
Is your feature request related to a problem? Please describe. With the migration to Alma9, we also started seeing
vm_kill
and condor_schedd restarts every now and then. Discussing these with the SI team (Marco M.), he suggested to increase the production WMAgent HTCondor spool area, which is currently defined at 8GB size.Describe the solution you'd like Follow up with the VoC and gradually increase the
/mnt/ramdisk
partition area from 8GB to 12GB. Nodes that are not in use can be modified right away, while those that are active will have to wait until we can stop services.Describe alternatives you've considered None
Additional context Latest condor_schedd restart and vm_kill dates from Oct/22/2024, on vocms0282.