Closed tscmoo closed 2 years ago
I'd suggest upping the version for this PR.
Hi, what's the status of this PR?
Hi, what's the status of this PR?
I'd love to merge it, but I noticed a significant regression in some training jobs, and haven't had time yet to debug it. I promise to look into this ASAP
I tried to install this branch and indeed the error in this issue https://github.com/facebookresearch/moolib/issues/27 disappears. But I also observed that this version is quite slow to run. Take your time. It is not blocking anything yet.
This brings tensorpipe up to date with the latest version. InfiniBand is now enabled by default, and all of the code for handling CUDA tensors is present in the RPC, but CUDA is still disabled by default, as CUDA tensors are not yet supported in all-reduce, and a bit more testing should be done.
It's a fair bit of code, but among some fixes/changes: