Request for Training Setup Details

I'm excited you're working to replicate the work! I'm happy to help however I can. Most of these details are in paper itself but here are the answers to your questions:

What GPU did you use for training? V100 How many GPUs did you use? 8 GPUs How long did the training process take? 1-2 days to see reasonable results. 5 days to full convergence. Do you have any suggestions for accelerating training? I'm sure there are plenty of optimizations that could be made to train faster--some of which I've looked into and some of which I haven't. If you come up with anything, I'd be very interested to hear. Some small things you could do: use the new robofin/geometrout libraries, which have much faster CPU operations for point cloud sampling, you could use Pytorch 2.0 and compile more of the tools (perhaps including the GPU point cloud sampling), tune the optimizer better (this paper used Adam with a relatively low learning rate, but there are probably fancier, faster optimization setups), you could pregenerate more of the data and store it on SSD instead of generating the point clouds on the fly.

If you want to discuss further, you can shoot me an email (email address on my website).

NVlabs / motion-policy-networks

Request for Training Setup Details #24