issues
search
calab-ntu
/
gpu-cluster
Eureka and Spock GPU clusters
3
stars
0
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Record electricmeter every first day of a month.
#56
xuanweishan
opened
6 months ago
0
Fix high temp nodes
#55
xuanweishan
closed
4 months ago
1
Set up LDAP server on tumaz
#54
xuanweishan
closed
8 months ago
1
Backup whole systme of login node.
#53
xuanweishan
closed
8 months ago
1
Reaching `work1` quota limit won't stop the related job and causing `eureka` slow.
#52
xuanweishan
opened
1 year ago
0
utilize the PBS epilogue script to terminate the MPS server automatically
#51
xuanweishan
closed
1 year ago
0
Test gpu again on `18` ` 25` `26`
#50
xuanweishan
closed
8 months ago
0
Mail rejected by gmail.
#49
xuanweishan
closed
8 months ago
0
NAS btrfs system file warning 'WARNING: Qgroup data inconsistent, rescan recommended'
#48
xuanweishan
closed
8 months ago
0
Optimize openmpi
#47
xuanweishan
opened
1 year ago
1
Cable : Can be purchase directly.
#46
xuanweishan
closed
1 year ago
0
Purchase procedure of components for new system.
#45
xuanweishan
closed
1 year ago
2
Purchase a new switch (1G or 2.5G) with 48 port.
#44
xuanweishan
closed
1 year ago
6
Purchase a 10G ethernet switch.
#43
xuanweishan
closed
1 year ago
3
Measure room temperature by thermometer
#42
xuanweishan
closed
1 year ago
0
Add a section in our System Log page to record the log of our NAS systems
#41
xuanweishan
opened
2 years ago
0
DOS_DNS issue
#40
xuanweishan
closed
2 years ago
1
Optimum `GPU/CPU` overlapping and `GPU` performance heavily rely on using large `MPI` ra##nks
#39
koarakawaii
opened
2 years ago
0
Installation steps and stress test
#38
xuanweishan
opened
2 years ago
5
Scan malware on both login node and computing nodes
#37
xuanweishan
closed
2 years ago
3
Purchase a new NAS
#36
xuanweishan
closed
1 year ago
5
Upgrade eureka i/o band width
#35
xuanweishan
opened
2 years ago
1
Support module in eureka
#34
xuanweishan
closed
11 months ago
5
Update yt
#33
xuanweishan
opened
2 years ago
4
Maintenance at December third
#32
xuanweishan
closed
2 years ago
1
Decide SSD spec
#31
xuanweishan
closed
2 years ago
1
Decide the spec to water cooler before 11/5
#30
xuanweishan
closed
1 year ago
0
Maintenance at 10/15
#29
xuanweishan
closed
3 years ago
0
Decide the spec of PSU before 10/6
#28
xuanweishan
closed
2 years ago
2
Install fftw3
#27
hyschive
opened
3 years ago
0
Bash history timestamp
#26
hyschive
closed
3 years ago
1
Eureka Maintenance
#25
xuanweishan
closed
3 years ago
0
Queue system
#24
xuanweishan
closed
8 months ago
1
Problematic nodes
#23
xuanweishan
opened
3 years ago
0
Investigate the reason why du and df report different storage usage.
#22
xuanweishan
opened
3 years ago
1
Parallel yt cannot find ucx for pml
#21
koarakawaii
opened
3 years ago
0
Investigate the reason of slow speed situation.
#20
xuanweishan
opened
3 years ago
0
Buy GPU support needs to be bought before the end of July, if it's needed.
#19
xuanweishan
closed
3 years ago
0
Decide the model of RAM at the end of July.
#18
xuanweishan
closed
2 years ago
0
Login node should have raid1 for system disks.
#17
xuanweishan
closed
1 year ago
0
Replace broken motherboard
#16
xuanweishan
closed
3 years ago
2
Install 3080 Ti when it arrives and test it
#15
xuanweishan
closed
8 months ago
0
Add GPU temperature threshold
#14
hyschive
closed
3 years ago
1
Screen and input switch in machine room (909) did not work
#13
xuanweishan
closed
3 years ago
3
Monitor error messages in log files of all nodes
#12
xuanweishan
opened
3 years ago
0
Eater broken extension replacement
#11
xuanweishan
closed
2 years ago
0
GPU tilted due to overweight
#10
koarakawaii
closed
3 years ago
1
Disk high reconnection count
#9
xuanweishan
opened
3 years ago
4
Power outage
#8
hyschive
closed
3 years ago
0
Replace thermal paste
#7
xuanweishan
closed
2 years ago
1
Next