calab-ntu / gpu-cluster

Eureka and Spock GPU clusters
3 stars 0 forks source link

Maintenance at December third #32

Closed xuanweishan closed 2 years ago

xuanweishan commented 3 years ago

To do:

xuanweishan commented 3 years ago

The process of changing the q name

  1. Activate super user : su
  2. Edit /var/spool/TORQUE/server_priv/nodes to assign names to the target subset of nodes --> For example: eureka32 np=16 queue_name1 queue_name2
  3. systemctl restart pbs

Update the SSD firmware of IB switch

  1. enable to get the authority of root
  2. CLI Configuration Backup
    1. configuration write
    2. configuration upload active <upload URL>
  3. Reset Factory :
    reset factory
    # Warning - confirming will cause system reboot.
    # Type 'YES' to confirm reset: YES
    # Resetting and rebooting the system -- please wait...
  4. Restore Configuration:
    configuration fetch <download url>
    configuration switch-to <filename>
  5. Updating SSD Firmware