Reproducing Results - Githubissues

HimangiM commented 3 years ago

Hi,

Thank you providing the code. I was trying to reproduce the results on the noCrash benchmark from the paper and my reproduced results (Column 4) are as follows in comparison to the results given by your implementation (Column 3). I am using the pre-trained model for the noCrash benchmark.

Town, weather	Traffic	RAILS Result (from paper)	Reproduced result
Town 01, Train Weather	Empty	98	100
	Regular	100	95
	Dense	96	82
Town 01, Test Weather	Empty	90	80
	Regular	90	39
	Dense	84	34
Town 02, Train Weather	Empty	94	78
	Regular	89	63
	Dense	74	46
Town 02, Test Weather	Empty	78	36
	Regular	82	34
	Dense	66	24

Can you please let me know what can be the issue? I followed the installation instructions and I am using the evaluate_nocrash.py command under the noCrash routes.

Any help would be greatly appreciated.

Thanks, Himangi

dotchen commented 3 years ago

How are you launching the Carla server? What do you mean by your reproduced results? Do you have a link of your implementation that I can look at?

dotchen commented 3 years ago

I just used the downloaded weights and reran Town02 test/test empty to double-check, and I got 78:

town,traffic,weather,start,target,route_completion,lights_ran,duration
Town02,0,10,66,19,100.0,2,195.65
Town02,0,14,66,19,100.0,2,196.4
Town02,0,10,6,71,0.0,0,180.05
Town02,0,14,6,71,100.0,0,224.0
Town02,0,10,66,28,100.0,1,217.75
Town02,0,14,66,28,100.0,2,217.8
Town02,0,10,46,32,44.11,1,158.05
Town02,0,14,46,32,100.0,1,71.05
Town02,0,10,25,59,53.37,0,232.5
Town02,0,14,25,59,100.0,0,162.55
Town02,0,10,32,9,85.57,2,277.05
Town02,0,14,32,9,100.0,2,120.9
Town02,0,10,43,72,100.0,1,83.6
Town02,0,14,43,72,100.0,0,129.15
Town02,0,10,54,14,100.0,1,170.1
Town02,0,14,54,14,100.0,0,214.85
Town02,0,10,26,50,100.0,0,91.5
Town02,0,14,26,50,100.0,1,91.4
Town02,0,10,38,69,57.8,0,198.05
Town02,0,14,38,69,100.0,1,51.0
Town02,0,10,75,24,7.95,0,203.5
Town02,0,14,75,24,100.0,0,167.75
Town02,0,10,19,82,100.0,1,174.3
Town02,0,14,19,82,100.0,1,149.9
Town02,0,10,65,6,100.0,0,102.75
Town02,0,14,65,6,100.0,0,101.95
Town02,0,10,71,29,100.0,0,84.5
Town02,0,14,71,29,100.0,0,84.9
Town02,0,10,59,16,3.8,1,188.5
Town02,0,14,59,16,4.27,0,202.25
Town02,0,10,6,66,0.0,0,180.05
Town02,0,14,6,66,100.0,0,90.55
Town02,0,10,83,56,100.0,1,117.3
Town02,0,14,83,56,100.0,0,161.65
Town02,0,10,69,71,100.0,1,134.35
Town02,0,14,69,71,100.0,1,134.85
Town02,0,10,82,28,100.0,1,79.55
Town02,0,14,82,28,100.0,1,82.5
Town02,0,10,8,17,15.9,0,141.05
Town02,0,14,8,17,100.0,1,59.05
Town02,0,10,19,12,63.43,1,240.2
Town02,0,14,19,12,100.0,0,157.1
Town02,0,10,39,18,100.0,0,162.95
Town02,0,14,39,18,100.0,0,163.6
Town02,0,10,51,8,100.0,0,97.85
Town02,0,14,51,8,100.0,0,98.5
Town02,0,10,24,36,100.0,1,138.7
Town02,0,14,24,36,100.0,0,183.2
Town02,0,10,64,73,100.0,0,85.3
Town02,0,14,64,73,100.0,0,84.9

Can you give me some information on how you obtained those numbers? also, do make sure to launch Carla with the -vulkan flag, as specified in the launch_carla.sh script.

HimangiM commented 3 years ago

Thanks for the reply. Are you using the route_completion column for evaluation to get the final number as 78? How are you evaluating the above data to get 78?

dotchen commented 3 years ago

by definition, mean route completion == 100...

EDIT: I uploaded the script to parse nocrash results under the scripts folder, sorry for the inconvenience.

HimangiM commented 3 years ago

Thanks for the providing the result parsing script. I am trying to reproduce the noCrash results that are reported in the paper using the given RAILS pre-trained model. The Column 3 represents the results which I am getting by using the pre-trained model and the column 4 are the results that are reported in the paper.

Town, weather	Traffic	Reproduced Result	RAILS result (from the paper)
Town 01, Train Weather	Empty	91	98
	Regular	99	100
	Dense	91	96
Town 01, Test Weather	Empty	92	90
	Regular	84	90
	Dense	84	84
Town 02, Train Weather	Empty	94	94
	Regular	91	89
	Dense	65	74
Town 02, Test Weather	Empty	76	78
	Regular	82	82
	Dense	58	66

For some scenarios, the results from the pre-trained model seem to be slightly lower than the results reported in the paper, for example, in the dense scenarios of test town & {train,test} weather, {empty, dense} scenarios of train town & train weather, and regular scenario of train town & test weather. Can you please let me know what can be causing the slight difference in the results? Is it due to the stochasticity?

dotchen commented 3 years ago

Stochasticity is one thing but it usually only affects regular and dense scenario. The 91 vs 98 makes me think you are using the older commit of the code. Make sure that the velocity cap is 20 in image_agent.py for nocrash, as 15 is tuned for the leaderboard. Also, make sure you are launching CARLA with -vulkan.

Also, in the first table in this thread, the first entry is 100, but now it is 91. What changed? Did you just re-evaluate?

seem to be slightly lower than the results reported in the paper,

What I parsed from the table above is that some are lower but some are higher...

HimangiM commented 3 years ago

Thanks for the reply.

The 91 vs 98 makes me think you are using the older commit of the code. Make sure that the velocity cap is 20 in image_agent.py for nocrash, as 15 is tuned for the leaderboard. Also, make sure you are launching CARLA with -vulkan.

I launched the CARLA server with -vulkan. Maybe the old commit could be the issue. I'll try to use the new commit with the velocity cap as 20.

Also, in the first table in this thread, the first entry is 100, but now it is 91. What changed? Did you just re-evaluate?

I re-evaluated using the route completion == 100. In the first table, I was using the criteria of an episode ending with success or failure for evaluation. The first table can be ignored.

dotchen / WorldOnRails

Reproducing Results #4