Closed hanbinhu closed 3 years ago
Command to run: bfrun -np 4 python examples/pytorch_benchmark.py --dist-optimizer=allreduce Commit aca0fec and after triggers nccl_controller.cc complaining an illegal memory access was encountered and other kinds of issues.
bfrun -np 4 python examples/pytorch_benchmark.py --dist-optimizer=allreduce
an illegal memory access was encountered
It is related to the NCCL version. However, not certain with the reason yet.
Maybe related to issue #44
It should be resolved completely
Command to run:
bfrun -np 4 python examples/pytorch_benchmark.py --dist-optimizer=allreduce
Commit aca0fec and after triggers nccl_controller.cc complainingan illegal memory access was encountered
and other kinds of issues.