Closed ynw2021 closed 11 months ago
Hi, @ynw2021. We used all the accessible data for the nuPlan challenge. In this open-source version, we have only used the selected testing scenario types for quicker data processing. Feel free to remove this restriction from your code if your want to further enhance the model perfermance.
Hi huang, I find out that it will cost thousands of hours to preprocess the whole dataset. Is this normal or did you limit the number of scenario per type for the challenge?
Hi, @ynw2021. The processing of the entire dataset can take a significant amount of time, which is normal. We initially tried to speed up the process by using multiprocessing, but it had the opposite effect and slowed it down, possibly due to problems with querying the SQL dataset. We did the processing on a cluster computing server, so it did not take that long (thousands of hours) but still hundreds of hours.
Thanks a lot for the help. Another question: Is this warning ok when processing data?
Yes, it is OK and doesn't affect anything.
Another thing,did you set the timestamp_threshold_s?
Hi huang,
Thank you for sharing the great work! When I check the data_process.py, it seems that you only used the test 14 scenario. Is that the same thing you did when training your nuplan challenge submittion? Could you give more details about which subset you used for training, and why not to use the whole dataset?
Bests,