Closed jxh314 closed 3 months ago
Modifying /etc/security/limits.conf
is the long term solution so that limits are good on the next reboot. Did you reboot? Did you ensure ulimit -l was indeed showing the right values, even within mpirun
?
You can ensure that running mpirun <mpirun args> <some script printing ulimit -l>
Thanks a lot, it works!
before setting this, the result of ulimit -l
was only 8192.
Hello, when I run the following command to disable shm on a single node in order to force the use of ib for testing, I encounter the following error:
Modifying the /etc/security/limits.conf configuration file as referenced in https://docs.nvidia.com/deeplearning/nccl/archives/nccl_2143/user-guide/docs/troubleshooting.html#infiniband also does not take effect. The log, topo and graph file are as follows:
single.log
dump_topo.txt
dump_graph.txt
By the way, the perftest is ok. Can you help with that?