OpenDriveLab / TCP

[NeurIPS 2022] Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline.
Apache License 2.0
310 stars 40 forks source link

Recurrent Module Training #3

Closed vaydingul closed 1 year ago

vaydingul commented 1 year ago

Hi,

First, I would like to congratulate you on this great work. Currently, I am trying to replicate your work. Assuming that the whole network is being trained altogether (no module freezing etc.), it takes a lot of time to train the model due to the limited capacity of GPUs. Therefore, one reasonable option seems to freeze the image and measurement encoder part during the training of the auto-regressive trajectory and multi-step control prediction module. Did you do something like this? What was your approach? Do you have any suggestions?

Also, when will the code be released?

Best.

penghao-wu commented 1 year ago

Thank you for your interests in our work. Could you please tell me how much GPU memory and the corresponding batch size it costs in your case? It costs about 8G GPU memory for batch size = 32 in our case. And our current plan is to release the code and dataset around Sept.

vaydingul commented 1 year ago

Hi,

Currently, I have access to 2 x Tesla_V100 GPU (2 x 32GB), and I am able to use a batch size of 2x192=384. However, with the current setup, it takes ~4-5 hours to complete one epoch. The train set consists of 61 episodes (61 x 3000 = 183000 datapoints.) So, do you think it is normal?

Therefore, one reasonable option seems to freeze the image and measurement encoder part during the training of the auto-regressive trajectory and multi-step control prediction module.

Did you do something like this?

Thanks :)

penghao-wu commented 1 year ago

I think it is not normal. In our case, we use 4 V100 GPUs with a batch size of 4*32=128. It only takes about 6-7 minutes for one epoch on around 180,000 samples. By the way, our model only takes one image as the input and the recurrent module is working on latent features instead of multiple input images. Is that so in your case? No, we do not freeze the encoder and the whole model is jointly trained.

vaydingul commented 1 year ago

I see.

By the way, our model only takes one image as the input and the recurrent module is working on latent features instead of multiple input images. Is that so in your case?

Yeah, it is also the case for me.

I think it is not normal. In our case, we use 4 V100 GPUs with a batch size of 4*32=128. It only takes about 6-7 minutes for one epoch on around 180,000 samples. Then, it seems that there is something wrong with the code.

Thank you for your kind response.

vaydingul commented 1 year ago

As a follow-up question:

You mentioned in your paper that you've collected data with Roach. To do this, did you use Roach's data collection code? Or did you prepare your own data collection module and incorporate/import the Roach RL agent?

I am asking this because, currently, I am using Roach's data collection module itself, and it also brings me the urge to use Roach's dataloader/dataset (not necessarily, of course) module. However, it seems like the data loading stage creates a bit of a bottleneck on the code, and I am trying to find out if that is really the case.

Thanks again :)

penghao-wu commented 1 year ago

We use the Carla leaderboard to collect data with the Roach RL model as an agent, since the original Roach codebase does not support adding scenario.

vaydingul commented 1 year ago

Thank you for the clarification!