Metric variance in repeated Leaderboard runs

aaronh65 commented 3 years ago

I've benchmarked an agent on the Leaderboard testing routes (with all scenarios present) over 10 repetitions and compiled the driving/route completion scores in a plot here:

Error bars are standard deviations around the mean represented by each bar. I've also plotted maximum and minimum scores on each route as dots on the corresponding bars. I've noticed that there's very high variance in many cases (see route 18 and 25 for example). Plotting the average number of infractions per route also shows high variance, which seems to indicate that each route plays out very differently - see the plot for route 25 below (y-axis is average number of infractions expected per run).

What causes this variance? The agent runs a neural net under the hood (RGB images as input) which should be deterministic - although I'd expect some variance to creep in if there's simulated noise.

Is there a way to set a seed somewhere to ensure that routes/scenarios play out the same way?

pjw1 commented 3 years ago

I've also noticed the same issue. I think there is some random seeds in the leaderboard evaluator. Besides, is the driving score valid if the variance is that high? I mean the number itself can fully represent its capability?

glopezdiest commented 3 years ago

Hey @aaronh65. We are fully aware about the non deterministic behaviors of the Leaderboard (and CARLA in general) and we are definitely working on improving them. The two main issues with the non-deterministic Leaderboard are the sensors and the background activity. As of right now, the latter is now fully deterministic and the former has had it's variance greatly reduced. These features are available on the dev branch and will be a part of the 0.9.11 release.

aaronh65 commented 3 years ago

@glopezdiest, thank you for your response - I'm excited to hear that the next version of CARLA/Leaderboard will be more deterministic! When do you think 0.9.11 will be released?

Also if I may ask, how will users be able to control determinism? Would it really be as simple as setting a seed number somewhere?

glopezdiest commented 3 years ago

We just released 0.9.11 pre holidays, and the Leaderboard is definitely compatible with that version.

For the way to control determinism, the leaderboard has a trafficManagerSeed argument to set the seed of the TrafficManager, which will change how the background activity moves. For the rest, (which I think its only the vehicle models, but I may be wrong), you have to go to srunner/carla_data_provider, and change the _randomSeed value to any other number.

aaronh65 commented 3 years ago

Didn't realize it had already been released, thanks for the heads up! I changed my Leaderboard scripts to use 0.9.11 but I'm encountering an issue when running a route for multiple repetitions. I can run for one repetition, but when Leaderboard progresses to the next repetition it doesn't seem to wait for the world to be fully reloaded before running the next route.

This results in the following terminal output (occurs after registering statistics for repetition 0). I suspect there might be something up with how the _load_and_wait_for_world method checks that everything is ready since the > Loading the world step occurs nearly instantaneously when preparing for the second repetition.

> Registering the route statistics                                                                                                                                                                                                                                                                                                                                                                                                    
========= Preparing RouteScenario_3 (repetition 1) =========
> Setting up the agent                                                                                                                                                                                             
<All keys matched successfully>
> Loading the world
> Running the route
/home/aaron/anaconda3/envs/lblbc/lib/python3.7/site-packages/torch/nn/functional.py:2506: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify
 align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode))   

Error during the simulation:                                                                                                                                                                                       
> A sensor took too long to send their data

Traceback (most recent call last):
  File "/home/aaron/workspace/carla/2020_CARLA_challenge/leaderboard/leaderboard/envs/sensor_interface.py", line 236, in get_data
    sensor_data = self._new_data_buffers.get(True, self._queue_timeout)
  File "/home/aaron/anaconda3/envs/lblbc/lib/python3.7/queue.py", line 178, in get
    raise Empty
_queue.Empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/aaron/workspace/carla/2020_CARLA_challenge/leaderboard/leaderboard/scenarios/scenario_manager.py", line 152, in _tick_scenario
    ego_action = self._agent()
  File "/home/aaron/workspace/carla/2020_CARLA_challenge/leaderboard/leaderboard/autoagents/agent_wrapper.py", line 77, in __call__
    return self._agent()
  File "/home/aaron/workspace/carla/2020_CARLA_challenge/leaderboard/leaderboard/autoagents/autonomous_agent.py", line 104, in __call__
    input_data = self.sensor_interface.get_data()
  File "/home/aaron/workspace/carla/2020_CARLA_challenge/leaderboard/leaderboard/envs/sensor_interface.py", line 241, in get_data
    raise SensorReceivedNoData("A sensor took too long to send their data")
leaderboard.envs.sensor_interface.SensorReceivedNoData: A sensor took too long to send their data

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "leaderboard/leaderboard/leaderboard_evaluator.py", line 348, in _load_and_run_scenario
    self.manager.run_scenario()
  File "/home/aaron/workspace/carla/2020_CARLA_challenge/leaderboard/leaderboard/scenarios/scenario_manager.py", line 136, in run_scenario
    self._tick_scenario(timestamp)
  File "/home/aaron/workspace/carla/2020_CARLA_challenge/leaderboard/leaderboard/scenarios/scenario_manager.py", line 156, in _tick_scenario
    raise RuntimeError(e)
RuntimeError: A sensor took too long to send their data
> Stopping the route

glopezdiest commented 3 years ago

Hmmm, the error you are seeing is related to the sensors. In the current synchronous mode, the sensors are redendered in parallel and the server might not wait for them to be passed to the client. In order to ensure determinism, we added some code to the leaderboard, to wait for such information.

A sensor took too long to send their data

This error basically means that some sensor you used didn't correctly send their data. Which version of CARLA are you using? What sensors have you set up for the ego vehicle?

aaronh65 commented 3 years ago

I'm using CARLA 0.9.11, and I think I might have figured out which sensor is throwing the problem. The speedometer doesn't appear to be getting reset in between repetitions. In leaderboard/envs/sensor_interface.py I uncommented this line and got the following terminal output at the last two frames of the first repetition (the first line and last two lines of each output block are my own debug print statements)

getting sensor data                          
Getting gps - 165                            
Getting imu - 165                            
Getting rgb_right - 165
Getting rgb_left - 165            
Getting rgb - 165      
Getting speed - 165                                         
got sensor data at 6.55
applied control at 6.55       

getting sensor data                                                                    
Getting imu - 166                                                                                                                           
Getting gps - 166        
Getting rgb_right - 166                                
Getting rgb_left - 166
Getting rgb - 166  
Getting speed - 166                                                         
got sensor data at 6.60
applied control at 6.60

Then, after the world reloads and the next repetition is started, this is what I see for the first two frames

getting sensor data
Getting speed - 166
Getting gps - 37
Getting imu - 37
Getting rgb_right - 37
Getting rgb_left - 37
Getting rgb - 37
got sensor data at 0.00
applied control at 0.00

getting sensor data
Getting imu - 38
Getting gps - 38
Getting rgb_right - 38
Getting rgb_left - 38
Getting rgb - 38

Error during the simulation:
> A sensor took too long to send their data

So to me it looks like the simulation hangs on trying to get the speedometer reading, and the speedometer seems to still have the frame number (I think that's what the second number is) from the last repetition.

glopezdiest commented 3 years ago

Ah, have you updated ScenarioRunner to the latest master too? We added that reset there

aaronh65 commented 3 years ago

I have not updated to ScenarioRunner yet, and I'll be sure to do that soon and report back - thanks for the suggestion!

aremanap commented 2 years ago

Hello @glopezdiest! I currently encounter a similar problem as described in this issue. I am evaluating my agent using the current leaderboard evaluation script. When I run the exact same model several times on the same route with the same scenario, I notice non-deterministic traffic and some variance in the results (DS, RC).

I am using CARLA 0.9.12, the current Leaderboard master branch and the Scenario_Runner in version 0.9.12.

My first approach is to run the TM in deterministic mode to achieve better reproducibility. Therefore, I call leaderboard_evaluator.py with --traffic-manager-seed=42. However, I still find that there is a lot of non-determinism in the traffic when I run the same route multiple times (--repetitions=9). Is this behavior still to be expected with the deterministic TM in CARLA 0.9.12?

Attached are some pictures of consecutive repetitions of the same route. Do I need to make any other changes besides setting the TM seed to get deterministic traffic? Thanks in advance for your help!

glopezdiest commented 2 years ago

@aremanap Yeah, we actually just saw that too. For 0.9.12 we changed the TM seed again, as it was causing weird behavior in some configurations. That however, caused the TM to be undeterministic if a batch of vehicles is created (as the list sent to the TM is in a different order, so the seeds are different), which our smoke tests didn't detect. We are updating those plus the TM and Leaderboard to make sure it is back to being deterministic.

This should be fixed today, and when it's ready, I'll reply back with all the information needed

varunjammula commented 2 years ago

Hi @glopezdiest, is the fix released?

glopezdiest commented 2 years ago

Not yet as this week is a long holiday in Spain :) The problem we are having is related to spawning vehicles in batch, which sends an unordered list to the TM, generating different seeds for different vehicles. For now, these two branches patch the problem:

These branches will be merged by this week

glopezdiest commented 2 years ago

As an update to this topic, it took us longer than expected, as in the end, there was another problem with the physics which had to be solved. As of right now, it is fixed, and the changes you need are in this CARLA branch, which changes the TM.

glopezdiest commented 2 years ago

We just merged the PR (sorry for the delay, Spain just has too many holidays in December :slightly_smiling_face: ). In the end, as I previously mentioned, it was a TM problem, and it is now in master. I'm closing this issue but feel free to reopen it / create another one if it still happening.

carla-simulator / leaderboard

Metric variance in repeated Leaderboard runs #80