ES 模型单机多卡训练
1、执行 xparl start --port 8837 --cpu_num 48
2、执行 fleetrun train.py
报错如下:
[07-13 15:41:57 Thread-12 @client.py:301] ERR [xparl] lost connection with a job, current actor num: 19
[07-13 15:41:57 Thread-52 @client.py:301] ERR [xparl] lost connection with a job, current actor num: 18
[07-13 15:41:57 Thread-50 @client.py:301] ERR [xparl] lost connection with a job, current actor num: 17
[07-13 15:41:58 Thread-60 @client.py:301] ERR [xparl] lost connection with a job, current actor num: 16
ES 模型单机多卡训练 1、执行 xparl start --port 8837 --cpu_num 48 2、执行 fleetrun train.py 报错如下: [07-13 15:41:57 Thread-12 @client.py:301] ERR [xparl] lost connection with a job, current actor num: 19 [07-13 15:41:57 Thread-52 @client.py:301] ERR [xparl] lost connection with a job, current actor num: 18 [07-13 15:41:57 Thread-50 @client.py:301] ERR [xparl] lost connection with a job, current actor num: 17 [07-13 15:41:58 Thread-60 @client.py:301] ERR [xparl] lost connection with a job, current actor num: 16