Not able to train STAGE_1

petrovicu commented 3 years ago

Hi,

I have checked out the project master branch, run pre-trained models provided, and it worked as defined from your side. I am using CARLA version 0.9.10.1.

Then, I downloaded provided dataset to run a training, and got much worse results at the end. I used batch size 32 for both stages, tried both values for command coefficient (0.1 and 0.01), lr=0.0001, temp=10, sample_by=even, hack=True, and 50 (stage1) + 90 (stage2) epochs. Since I am using the latest CARLA version and the master repo is updated according to it, I assumed the problem occurred because the provided dataset is collected using older version (for example different classes in semantic map). Is this correct?

Having this in mind, I used a provided autopilot to collect the same amount of data from the latest version. Nevertheless, the training results were still bad. Since the whole process is time consuming I wrote an evaluation script for stage1 to check if it is working properly. It worked great with the checkpoints you provided (epoch=34.ckpt for both cc values), but it didn't with mine. It looks like even stage1 part introduces a problem, which eventually causes stage2 to work poorly. BTW, I also tested my stage1 checkpoints trained with your provided dataset with stage1 evaluation scripts and it also worked poorly.

Do you have any idea why is this happening?

Regards

bradyz commented 3 years ago

can you link me to some of your wandb runs from training? i want to see some of the visualizations of the model's predictions

petrovicu commented 3 years ago

Thanks for quick response!

Stage_1 training with original data provided from your side: URL link 1

Stage_1 training with data collected from CARLA 0.0.10.1: URL link 2

petrovicu commented 3 years ago

Hi @bradyz,

You gave me a good hint to check those visualizations during the training, and it looks like the target point is wrong, take a look at the positions of white dot (this is my train_image from wandb):

Also, I noted that the gps sensor data values from CARLA 0.9.10.1 are different from those from previous versions, so for the same map and the same route (route_08.xml) within both CARLA versions I got:

# CARLA 0.9.9
gps = [48.99706601, 8.0028032]
...
mean = np.array([49.0, 8.0])
scale = np.array([111324.60662786, 73032.1570362])
gps_after_normalization_and_scaling = (gps - mean) * scale
...
gps_after_normalization_and_scaling = [-326.62542881, 204.72399902]

# CARLA 0.9.10.1
gps = [-0.0029339, 0.00183903]
...
mean = np.array([49.0, 8.0])
scale = np.array([111324.60662786, 73032.1570362])
gps_after_normalization_and_scaling = (gps - mean) * scale
...
gps_after_normalization_and_scaling = [-5455232.34057393, -584122.94774424]

It looks like new gps data are already normalized (but not quite as expected), so after I remove mean subtraction I have:

# CARLA 0.9.10.1
gps = [-0.0029339, 0.00183903]
scale = np.array([111324.60662786, 73032.1570362])
gps_after_scaling = gps * scale
# and I got:
gps_after_scaling =[-326.61526339, 134.30832775]

And btw, how did you get these exact values for mean and scale?

self.mean = np.array([49.0, 8.0])
self.scale = np.array([111324.60662786, 73032.1570362])

petrovicu commented 3 years ago

Hi @bradyz ,

The problem was that the gnss values from CARLA 0.9.10.1 are already normalized using OpenDrive geo-reference values (49.0, 8.0), so there is no need to do it again on your side. As a result, the scale factor should be: scale = np.array([111324.60662786, 111324.60662786]).

You can close this one.

bradyz commented 3 years ago

sorry for the slow response! thanks for figuring this one out - I'll need to make sure this bit isn't as hacky

bradyz / 2020_CARLA_challenge

Not able to train STAGE_1 #40