Closed bohanfeng closed 1 year ago
I'm excited you're working to replicate the work! I'm happy to help however I can. Most of these details are in paper itself but here are the answers to your questions:
What GPU did you use for training? V100 How many GPUs did you use? 8 GPUs How long did the training process take? 1-2 days to see reasonable results. 5 days to full convergence. Do you have any suggestions for accelerating training? I'm sure there are plenty of optimizations that could be made to train faster--some of which I've looked into and some of which I haven't. If you come up with anything, I'd be very interested to hear. Some small things you could do: use the new robofin/geometrout libraries, which have much faster CPU operations for point cloud sampling, you could use Pytorch 2.0 and compile more of the tools (perhaps including the GPU point cloud sampling), tune the optimizer better (this paper used Adam with a relatively low learning rate, but there are probably fancier, faster optimization setups), you could pregenerate more of the data and store it on SSD instead of generating the point clouds on the fly.
If you want to discuss further, you can shoot me an email (email address on my website).
Hello, I'm interested in replicating your neural network training for whole model in the paper. Could you please provide me with some details on your training setup? Specifically, I'm curious to know:
What GPU did you use for training? How many GPUs did you use? How long did the training process take? Do you have any suggestions for accelerating training?
I'm asking because I'm currently unable to train the model due to a lack of available GPU resources in my lab. I'll need to rent GPU resources online, so it would be helpful to have an estimate of the resources required and the potential cost. Thank you in advance for your help!