Closed alberthli closed 9 months ago
Agreed on the first two requested tests - those should be trivial to spin up.
Sanity check of choice: randomly sample a batch of initial policies and also shoot them forward. pass those guesses to vanilla predictive sampling, which returns new trajectories that should be no worse than the guesses. verify this property in a test.
TODO list:
q
and v
to x
everywhereVanillaPredictiveSampler
+ optimize
+ jit
VanillaPredictiveSampler
This PR adds a very generic and flexible API for trajectory optimization. As a specific instantiation, it also implements the extremely simple predictive sampling algorithm.