Closed vwxyzjn closed 4 years ago
Hey Costa,
Thanks for your interest and kind comments!
You are right that this example runs PolyBeast on the same machine. In order to run it across machines, you'd have to change the code a little bit I'm afraid: Right now it finds the environment servers via their pipe names, which look like unix://path/to/a/file
. You'd instead have to use ip/port addresses like 127.0.0.1:12345
. That change would happen e.g. here: https://github.com/facebookresearch/torchbeast/blob/master/torchbeast/polybeast.py#L448
Good luck :)
Hi, I think your implementation of IMPALA is really well done. The code is concise, clear, and understandable.
I do have a question regarding distributed training. In https://github.com/facebookresearch/torchbeast#running-polybeast, it seems the instructions still assume that the script will be run under a single machine. In the TF implementation, we can configure the multi machine setting using
ClusterSpec
, as shown here https://github.com/deepmind/scalable_agent/blob/6c0c8a701990fab9053fb338ede9c915c18fa2b1/experiment.py#L479.I was wondering if there's anyway to do the same with `torchbeast.
Thanks a lot.