dmwm / WMCore

Core workflow management components for CMS.
Apache License 2.0
46 stars 107 forks source link

Increase HTCondor spool ramdisk partition from 8GB to 12GB #12156

Open amaltaro opened 2 hours ago

amaltaro commented 2 hours ago

Impact of the new feature WMAgent

Is your feature request related to a problem? Please describe. With the migration to Alma9, we also started seeing vm_kill and condor_schedd restarts every now and then. Discussing these with the SI team (Marco M.), he suggested to increase the production WMAgent HTCondor spool area, which is currently defined at 8GB size.

Describe the solution you'd like Follow up with the VoC and gradually increase the /mnt/ramdisk partition area from 8GB to 12GB. Nodes that are not in use can be modified right away, while those that are active will have to wait until we can stop services.

Describe alternatives you've considered None

Additional context Latest condor_schedd restart and vm_kill dates from Oct/22/2024, on vocms0282.

amaltaro commented 2 hours ago

Relevant JIRA ticket: https://its.cern.ch/jira/browse/CMSVOC-598