Closed vaydingul closed 1 year ago
Thank you for your interests in our work. Could you please tell me how much GPU memory and the corresponding batch size it costs in your case? It costs about 8G GPU memory for batch size = 32 in our case. And our current plan is to release the code and dataset around Sept.
Hi,
Currently, I have access to 2 x Tesla_V100 GPU (2 x 32GB), and I am able to use a batch size of 2x192=384. However, with the current setup, it takes ~4-5 hours to complete one epoch. The train set consists of 61 episodes (61 x 3000 = 183000 datapoints.) So, do you think it is normal?
Therefore, one reasonable option seems to freeze the image and measurement encoder part during the training of the auto-regressive trajectory and multi-step control prediction module.
Did you do something like this?
Thanks :)
I think it is not normal. In our case, we use 4 V100 GPUs with a batch size of 4*32=128. It only takes about 6-7 minutes for one epoch on around 180,000 samples. By the way, our model only takes one image as the input and the recurrent module is working on latent features instead of multiple input images. Is that so in your case? No, we do not freeze the encoder and the whole model is jointly trained.
I see.
By the way, our model only takes one image as the input and the recurrent module is working on latent features instead of multiple input images. Is that so in your case?
Yeah, it is also the case for me.
I think it is not normal. In our case, we use 4 V100 GPUs with a batch size of 4*32=128. It only takes about 6-7 minutes for one epoch on around 180,000 samples. Then, it seems that there is something wrong with the code.
Thank you for your kind response.
As a follow-up question:
You mentioned in your paper that you've collected data with Roach. To do this, did you use Roach's data collection code? Or did you prepare your own data collection module and incorporate/import the Roach RL agent?
I am asking this because, currently, I am using Roach's data collection module itself, and it also brings me the urge to use Roach's dataloader/dataset (not necessarily, of course) module. However, it seems like the data loading stage creates a bit of a bottleneck on the code, and I am trying to find out if that is really the case.
Thanks again :)
We use the Carla leaderboard to collect data with the Roach RL model as an agent, since the original Roach codebase does not support adding scenario.
Thank you for the clarification!
Hi,
First, I would like to congratulate you on this great work. Currently, I am trying to replicate your work. Assuming that the whole network is being trained altogether (no module freezing etc.), it takes a lot of time to train the model due to the limited capacity of GPUs. Therefore, one reasonable option seems to freeze the image and measurement encoder part during the training of the auto-regressive trajectory and multi-step control prediction module. Did you do something like this? What was your approach? Do you have any suggestions?
Also, when will the code be released?
Best.