calab-ntu / gpu-cluster

Eureka and Spock GPU clusters
3 stars 0 forks source link

Eureka Maintenance #25

Closed xuanweishan closed 3 years ago

xuanweishan commented 3 years ago

Date: 9/6

ROUTINE:

  1. Check high temperature nodes. Check Replace thermal paste

NEW ISSUE:

  1. Add additional power cable on eureka00. Check Add additional GPU power cable
  2. Replace RAMs on eureka00 and test those RAMs.
  3. Install GPU support on eureka00 and computing nodes.
  4. Replace broken NAS extension. Check Eater broken extension repalcement
  5. Update the os of NAS