MCZhi / GameFormer-Planner

[ICCV & CVPR Workshop] Learning-enabled Interactive Prediction and Planning Framework for Autonomous Vehicles
https://mczhi.github.io/GameFormer/
MIT License
143 stars 17 forks source link

Amount of data to process for validation and train set. #16

Closed HenryDykhne closed 4 months ago

HenryDykhne commented 4 months ago

Hi,

I'm trying to get a representative sample of your model, do you have a set of recommended training settings as well as sizes for the validation and training sets? Do you suggest processing all of the training sets in the nuplan dataset? Or is it possible to get away with a smaller subset? Bear in mind that I do not have access to 4 A100s as you did. I only have access to 4 RTX4090s which are faster but have much less memory.

Also, does using the given command multiple different times with different datapaths deposit all of the different processed data in the same repository? Or does it just overwrite it?:

python data_process.py \
--data_path nuplan/dataset/nuplan-v1.1/splits/mini \
--map_path nuplan/dataset/maps \
--save_path nuplan/processed_data

Any advice here would be appreciated.

In addition, do you have any suggestions for running your model in a similar way as the urban planner model is shown to be run where the relevant model can be loaded into the omegaconfig in a jupyter notebook? Help here would also be appreciated.

MCZhi commented 4 months ago

Hi, @HenryDykhne, thank you for your interest. Since the provided model is a tiny version of GameFormer, I think using around 300k data points is enough. You can extract this amount of data from either the validation or testing sets, and it's not necessary to process all the training sets. You can limit the total number of scenarios to be processed by setting up the total_scenarios parameter. The tiny model would work perfectly with 4 RTX4090s, though I suggest you increase the layers of encoder and decoder as this will give you better performance.

If you keep the save_path unchanged, the processed data will be saved in the same directory. Each data point has a unique save name, so processed data won't be overwritten.

I recommend running the run_nuplan_test script to test the planner. Our file structure is much simpler than the original testing protocol from nuPlan. However, you can still use Planner/planner.py and manually set its model and parameters as the general planner, and then pass this planner to the nuPlan testing script using the Jupyter notebook.

HenryDykhne commented 4 months ago

When you say 300k datapoints, do you mean 300k scenarios? For example, when processing the validation set, it shows it has: Total number of scenarios: 12170

I'm not sure 300k scenarios in the dataset exist even if you add up all the training sets and validation sets.

Can you please clarify?

MCZhi commented 4 months ago

Try setting --scenarios_per_type None to remove the type restriction, then more scenarios will be available to use.

HenryDykhne commented 4 months ago

I see. A few more questions. How long did your training process take, and how did you split it between train and val? 12000 scenarios took approximately 3 hours to clean. It would seem that 300k would take about 80 hours.

MCZhi commented 4 months ago

The nuPlan dataset does not work well with multi-processing. Therefore, data processing may take up to 80 hours for 300k data points. If you would like to initially train and validate your models, you may consider reducing the amount of data to 100k. I typically split the train and validation sets randomly with a ratio of 9:1.

HenryDykhne commented 4 months ago

I see. and how long did the training process take for 300k datapoints? Also, will there be a significant quality dip with only 100k datapoints?

MCZhi commented 4 months ago

I am not certain about the exact time required for the training process since it depends on your training configuration, as well as the GPU and I/O speed of your computer. I think there won't be a significant quality dip with 100k data points, so it can be a good start.

HenryDykhne commented 4 months ago

Thankyou for the information, but I'm just looking for an order of magnitude here. did it take your machine a day? Or a couple of days? Or a week?

MCZhi commented 4 months ago

Probably it will take a day to train the modal on 300,000 data points.

HenryDykhne commented 4 months ago

Thankyou. Leave the conversation open for a week or more as I may have some more questions along the same thread.

VVeiCao commented 4 months ago

Hi, just some follow up questions: I use python data_process.py \ --data_path nuplan/dataset/nuplan-v1.1/splits/mini \ --map_path nuplan/dataset/maps \ --save_path nuplan/processed_data to process the data, it shows Total number of scenarios: 3084 which is way less than you mentioned 300k, or @HenryDykhne mentioned 12000 (@HenryDykhne Do you also use the nuplan-mini?) Also, after processing, should we by ourselves randomly split these subfiles like 'us-nv-las-vegasxxxxx.npz' to train and val folder (like 9:1 ratio)? And how about the test set?

MCZhi commented 4 months ago

This is because you are using the "mini" split, which contains only a very small subset of data. The provided code is an example of how to process data from various splits. You should replace "mini" with "train" or "val" to access a larger dataset. Additionally, consider removing the restriction on scenario types to get more scenarios for training.

You can use the original training and validation splits from the nuPlan dataset and save the processed data into the respective "train" and "val" directories. The test set is solely for evaluation purposes, so there is no need to process data in this set.