dotchen / WorldOnRails

(ICCV 2021, Oral) RL and distillation in CARLA using a factorized world model
https://dotchen.github.io/world_on_rails/
MIT License
167 stars 29 forks source link

Reproducing Results #4

Closed HimangiM closed 3 years ago

HimangiM commented 3 years ago

Hi,

Thank you providing the code. I was trying to reproduce the results on the noCrash benchmark from the paper and my reproduced results (Column 4) are as follows in comparison to the results given by your implementation (Column 3). I am using the pre-trained model for the noCrash benchmark.

Town, weather Traffic RAILS Result (from paper) Reproduced result
Town 01, Train Weather Empty 98 100
Regular 100 95
Dense 96 82
Town 01, Test Weather Empty 90 80
Regular 90 39
Dense 84 34
Town 02, Train Weather Empty 94 78
Regular 89 63
Dense 74 46
Town 02, Test Weather Empty 78 36
Regular 82 34
Dense 66 24

Can you please let me know what can be the issue? I followed the installation instructions and I am using the evaluate_nocrash.py command under the noCrash routes.

Any help would be greatly appreciated.

Thanks, Himangi

dotchen commented 3 years ago

How are you launching the Carla server? What do you mean by your reproduced results? Do you have a link of your implementation that I can look at?

dotchen commented 3 years ago

I just used the downloaded weights and reran Town02 test/test empty to double-check, and I got 78:

town,traffic,weather,start,target,route_completion,lights_ran,duration
Town02,0,10,66,19,100.0,2,195.65
Town02,0,14,66,19,100.0,2,196.4
Town02,0,10,6,71,0.0,0,180.05
Town02,0,14,6,71,100.0,0,224.0
Town02,0,10,66,28,100.0,1,217.75
Town02,0,14,66,28,100.0,2,217.8
Town02,0,10,46,32,44.11,1,158.05
Town02,0,14,46,32,100.0,1,71.05
Town02,0,10,25,59,53.37,0,232.5
Town02,0,14,25,59,100.0,0,162.55
Town02,0,10,32,9,85.57,2,277.05
Town02,0,14,32,9,100.0,2,120.9
Town02,0,10,43,72,100.0,1,83.6
Town02,0,14,43,72,100.0,0,129.15
Town02,0,10,54,14,100.0,1,170.1
Town02,0,14,54,14,100.0,0,214.85
Town02,0,10,26,50,100.0,0,91.5
Town02,0,14,26,50,100.0,1,91.4
Town02,0,10,38,69,57.8,0,198.05
Town02,0,14,38,69,100.0,1,51.0
Town02,0,10,75,24,7.95,0,203.5
Town02,0,14,75,24,100.0,0,167.75
Town02,0,10,19,82,100.0,1,174.3
Town02,0,14,19,82,100.0,1,149.9
Town02,0,10,65,6,100.0,0,102.75
Town02,0,14,65,6,100.0,0,101.95
Town02,0,10,71,29,100.0,0,84.5
Town02,0,14,71,29,100.0,0,84.9
Town02,0,10,59,16,3.8,1,188.5
Town02,0,14,59,16,4.27,0,202.25
Town02,0,10,6,66,0.0,0,180.05
Town02,0,14,6,66,100.0,0,90.55
Town02,0,10,83,56,100.0,1,117.3
Town02,0,14,83,56,100.0,0,161.65
Town02,0,10,69,71,100.0,1,134.35
Town02,0,14,69,71,100.0,1,134.85
Town02,0,10,82,28,100.0,1,79.55
Town02,0,14,82,28,100.0,1,82.5
Town02,0,10,8,17,15.9,0,141.05
Town02,0,14,8,17,100.0,1,59.05
Town02,0,10,19,12,63.43,1,240.2
Town02,0,14,19,12,100.0,0,157.1
Town02,0,10,39,18,100.0,0,162.95
Town02,0,14,39,18,100.0,0,163.6
Town02,0,10,51,8,100.0,0,97.85
Town02,0,14,51,8,100.0,0,98.5
Town02,0,10,24,36,100.0,1,138.7
Town02,0,14,24,36,100.0,0,183.2
Town02,0,10,64,73,100.0,0,85.3
Town02,0,14,64,73,100.0,0,84.9

Can you give me some information on how you obtained those numbers? also, do make sure to launch Carla with the -vulkan flag, as specified in the launch_carla.sh script.

HimangiM commented 3 years ago

Thanks for the reply. Are you using the route_completion column for evaluation to get the final number as 78? How are you evaluating the above data to get 78?

dotchen commented 3 years ago

by definition, mean route completion == 100...

EDIT: I uploaded the script to parse nocrash results under the scripts folder, sorry for the inconvenience.

HimangiM commented 3 years ago

Thanks for the providing the result parsing script. I am trying to reproduce the noCrash results that are reported in the paper using the given RAILS pre-trained model. The Column 3 represents the results which I am getting by using the pre-trained model and the column 4 are the results that are reported in the paper.

Town, weather Traffic Reproduced Result RAILS result (from the paper)
Town 01, Train Weather Empty 91 98
Regular 99 100
Dense 91 96
Town 01, Test Weather Empty 92 90
Regular 84 90
Dense 84 84
Town 02, Train Weather Empty 94 94
Regular 91 89
Dense 65 74
Town 02, Test Weather Empty 76 78
Regular 82 82
Dense 58 66

For some scenarios, the results from the pre-trained model seem to be slightly lower than the results reported in the paper, for example, in the dense scenarios of test town & {train,test} weather, {empty, dense} scenarios of train town & train weather, and regular scenario of train town & test weather. Can you please let me know what can be causing the slight difference in the results? Is it due to the stochasticity?

dotchen commented 3 years ago

Stochasticity is one thing but it usually only affects regular and dense scenario. The 91 vs 98 makes me think you are using the older commit of the code. Make sure that the velocity cap is 20 in image_agent.py for nocrash, as 15 is tuned for the leaderboard. Also, make sure you are launching CARLA with -vulkan.

Also, in the first table in this thread, the first entry is 100, but now it is 91. What changed? Did you just re-evaluate?

seem to be slightly lower than the results reported in the paper,

What I parsed from the table above is that some are lower but some are higher...

HimangiM commented 3 years ago

Thanks for the reply.

The 91 vs 98 makes me think you are using the older commit of the code. Make sure that the velocity cap is 20 in image_agent.py for nocrash, as 15 is tuned for the leaderboard. Also, make sure you are launching CARLA with -vulkan.

I launched the CARLA server with -vulkan. Maybe the old commit could be the issue. I'll try to use the new commit with the velocity cap as 20.

Also, in the first table in this thread, the first entry is 100, but now it is 91. What changed? Did you just re-evaluate?

I re-evaluated using the route completion == 100. In the first table, I was using the criteria of an episode ending with success or failure for evaluation. The first table can be ignored.