autonomousvision / transfuser

[PAMI'23] TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving; [CVPR'21] Multi-Modal Fusion Transformer for End-to-End Autonomous Driving
MIT License
1.12k stars 186 forks source link

Script for rerunning failed routes #189

Closed NewCoder3214 closed 1 year ago

NewCoder3214 commented 1 year ago

Hello, first of all, thank you for your huge code base! This makes it nice to reproduce your results. I have a question regarding the data generation: In the data generation, some routes failed due to timing out or other reasons. They did not crash. In #187 you provided a script for rerunning crashed routes. Do you have scripts for a smart way to rerun failed routes while generating data or afterwards? With the provided script in this repository, I can only rerun whole towns and not distinct routes. If I delete all unnecessary routes (that worked in the first run) in the .xml file and rerun it, the folders are always named wrong starting with “…route0…”, although I, for example, rerun routes 8 and 25.

Kait0 commented 1 year ago

I don't quite understand the problem. The script you mentioned restarts individual route if they have crashed detected by a couple of criterions. Timeout infraction of the expert are not considered a crash but rather suboptimal performance. In our followup work we only train on routes with 100 DS but in this repository we train on all data (also the once where the expert makes a mistake). Route files are rerun with the resume=1 flag, which will start at the last finished route. We don't have any additional scripts. Usually its not a big deal if some routes are missing as long as its only a small percent.

NewCoder3214 commented 1 year ago

So in this repository all routes that were driven by the expert are used, even though they failed? With failed I mean that the complete route is labeled as failed (timeout etc.) in the log file, not only for example a stop sign is ignored.

What would you consider a small percentage of failed routes? I experienced about 6%.

What exactly do you mean with the resume=1 flag?

Kait0 commented 1 year ago

We don't filter routes in the dataloader in this repo, yes. 6% seems fine, we experience similar failure rates.

Route files usually have multiple routes in them. If the simulation crashes in the lets say 6th route, the script will restart the collection for the route file but start with the 6/7 th route depending on the crash type.