GoogleCloudPlatform / slurm-gcp

Apache License 2.0
26 stars 20 forks source link

Remove extra logging receivers and add addtional labels to cloud ops … #212

Closed abbas1902 closed 3 days ago

abbas1902 commented 1 week ago

The main changes:

A. Labels added that contain the cluster name and (vm) hostname in each log. They start as placeholders in the cloud ops configuration but will be update at boot time by setup.py

B. The logging path added to the logging structure (as another label) the overall log names are generalized, since there’s no longer a need to have 6-7 different logNames it was shortened only 3, one for slurm daemons (which will be called slurm_daemon) and another for everything else slurm (thinking of putting this under the umbrella of just slurm to match legacy) and a third for the setup logs.

This allows the use of commands that can be filtered to hostname or cluster level with ease.

How was this tested? Deployed a VM instance and added the code change to the cluster directly. a. Ensured code does not run if user does not have cloud-ops-agent running as a service

Manually changed the cloud-ops configuration to reflect proposed changes. b. Verify configuration is valid and placeholders show as expected

Run a script changes as a single python function. c. Verify the function edits the configuration and restarts as expected, with updated label values

As seen above testing was by enlarge manual, the script changes can only be added along with the configuration changes.