All data received by node 125 is 939524096
sim_finish on sent, Thread id: 140257080956416
All data sent from node 126 is 939524096
sim_finish on received, Thread id: 140257080956416
All data received by node 126 is 939524096
sim_finish on sent, Thread id: 140257080956416
All data sent from node 127 is 939524096
sim_finish on received, Thread id: 140257080956416
All data received by node 127 is 939524096
0
Yes, you can check the current memory usage by running the top command during the process, and the phenomenon of being hung is most likely due to memory limitations.
Hardware environment
Reproduce
Use
root
user.Result
It hung at
layer num: 483
. The monitor on Aliyun shows the CPU usage is ~25%, but the hard disk read is abnormal.Maybe the simulation needs a huge amount (>>8G) of memory, which causing using swap memory on the server.
UPD: On a 4C 16G server, the simulation stops at
layer num: 599
(optimizer1), and it didn't output the collective message at last.do.sh:
Command:
Tail of logs:
do.sh:
Command:
Tail of logs: