cpnr / computing

0 stars 0 forks source link

계산노드 drain상태 #46

Closed jhgoh closed 6 months ago

jhgoh commented 6 months ago

계산노드가 drain상태로 잡을 받을 수 없는 상태.

$ sinfo -N -l                                                                                                                                                   
Fri May 24 11:54:03 2024                                                                                                                                        
NODELIST   NODES PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON                                                                     
entei          1   normal*     drained 128   128:1:1 512000        0      1   (null) Kill task failed                                                           
ho-oh          1   normal*   allocated 64     64:1:1 256000        0      1   (null) none                                                                       
lapras         1      gpu1       mixed 128   128:1:1 256000        0      1   (null) none                                                                       
mewtwo         1      gpu2       mixed 12     12:1:1 128000        0      1   (null) none                                                                       
raikou         1   normal*     drained 128   128:1:1 512000        0      1   (null) Kill task failed                                                           
suicune        1   normal*     drained 128   128:1:1 512000        0      1   (null) Kill task failed    

노드 상태를 release함.

scontrol update nodename='entei,raikou,suicune' state=resume

sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
normal*      up   infinite      4  alloc entei,ho-oh,raikou,suicune
gpu1         up   infinite      1    mix lapras
gpu2         up   infinite      1    mix mewtwo

@slowmoyang