Bluefog-Lib / bluefog

Distributed and decentralized training framework for PyTorch over graph
https://bluefog-lib.github.io/bluefog/
Apache License 2.0
291 stars 71 forks source link

Benchmark Example issue #79

Open hanbinhu opened 3 years ago

hanbinhu commented 3 years ago

The following command seems to fail. I use torch=1.8.0+cu111.

bfrun -np 4 python /home/hanbinhu/bluefog/test/../examples/pytorch_benchmark.py --model=lenet --num-iters=1 --dist-optimizer=gradient_allreduce
Traceback (most recent call last):
  File "/home/hanbinhu/bluefog/test/../examples/pytorch_benchmark.py", line 138, in <module>
    bf.broadcast_optimizer_state(optimizer, root_rank=0)
  File "/home/hanbinhu/miniconda3/envs/bluefog/lib/python3.8/site-packages/bluefog/torch/utility.py", line 201, in broadcast_optimizer_state
    p = torch.Tensor([p])
TypeError: must be real number, not NoneType