Training on interaction dataset

MarcelBruckner commented 4 years ago

As training now works on the unnormalized observations of the SAC agent, we now need to fix the interaction dataset usage.

MarcelBruckner commented 4 years ago

Instead of using a fixed wheelbase of 2.7, we should use a command line flag to pass it to the generate script. So we can generate the trajectories for different cars.

MarcelBruckner commented 4 years ago

I think it would also make sense to write a test where we load a SAC trajectory and a Interaction Dataset trajectory and then compare the ranges of the actions and observations. I think the observations should be quite similar, but the actions could be really different.

MarcelBruckner commented 4 years ago

I think I can prove that with the current setup we cannot achieve training. To verify run bazel run //examples:calculate_mean_observations and carefully read the command line output. This script calculates the mean observations and mean actions of the SAC agent on the merging blueprint and of all trajectories in the interaction dataset. When we compare these we see that the means of different observations are really far from each other. This coincides with the observation I have made that the discriminator can directly distinguish with accuracy of 100% after the first few scenarios.

GAIL-4-BARK / bark-ml

Training on interaction dataset #71