Closed BeckywithYaoji closed 1 year ago
Hi @BeckywithYaoji,
This is very unlike what we observed so I would guess there is some issue in your setup. For reference, we used 8 V100 GPUs with 16 GB of memory each, and the training time was ~1.5 hours per epoch.
I would suggest to make sure you are using the GPUs (you can use the command nvidia-smi
). Another thing to look into the data loading time to make sure there is no IO bottleneck. Let me know what you find out.
Closing because of inactivity. Please feel free to reopen.
Hello there,I hope you're doing well. I wanted to share my experience with your project. I've been using 8 A100 GPUs, each with 40GB of memory, to train an RLBench task. However, I've noticed that training a single epoch takes around 12 hours, and I'm using the default parameters provided by you. I'm curious if there might be an issue somewhere in my setup. I'd greatly appreciate your insights and guidance on this matter. Thank you for your time and effort in developing this project. Looking forward to your response.