facebookresearch / torchbeast

A PyTorch Platform for Distributed RL
Apache License 2.0
734 stars 113 forks source link

polybeast docker image(gpu version) #19

Closed mschen97 closed 3 years ago

mschen97 commented 3 years ago

Thanks a lot for your team's work and It helps a lot ,but we have some problems and hope to get your help.

We found it's difficult to deploy polyeast using dockerfile and non-docker deployment methods on our machines. This is partly due to our network problems :). Therefore, we chose to search for the available polyeast images in the docker hub. We found the most downloaded https://hub.docker.com/r/torchbeast/ci-polybeast-cpu37/tags The author should be a member of your team. But after we download it, we found the content file incomplete. We don't know how to use it and according to the name of this image, it should be a CPU version, while we need a GPU one. After that, the other two images should be uploaded by other users. We are using the second one now(the only GPU version), but we have encountered some problems. The actor thread will exit inexplicably after the task having been executed for about 3 hours, (parameter:timeout_ms=10000). As a result, learner can't communicate with the server. The inference queue has been empty and the task can't continue. We haven't found the reason yet. We are worried about the image itself having some problem. Therefore, we have a request -- Can your team provide a polyeast image (GPU version) and transfer it to the docker hub? If it is not convenient or can not be provided in the near future, we will try to find another way, but we still hope to get the image provided by you. Thank you very much!