FarmDragon / AppleCart

Pick dem apples
0 stars 0 forks source link

Get from any start state to goal state with value iteration #9

Open cypressf opened 2 years ago

cypressf commented 2 years ago

Goal

Use function approximation and value iteration to find an approximate continuous cost-to-go function. Use this cost-to-go function's corresponding optimal policy to control the cart-pole to get to a goal state from any given starting state.

Process

cypressf commented 2 years ago

For now I added the code from the value iteration homework and the textbook example from chapter 7: https://github.com/FarmDragon/AppleCart/commit/14eee4f6220f17a5629db85faeb77811a8333d38

I'm working on branch value-iteration https://github.com/FarmDragon/AppleCart/tree/value-iteration

cypressf commented 2 years ago

The homework has a single-pendulum cart-pole optional exercise. I'm trying to get that to work first since I imagine it's much more feasible.

My first few runs of it while twiddling the hyperparameters didn't amount to anything. The cart just slowly moved to the right. I'm going to try visualizing the loss and the value function over time to see if I can debug what's going wrong.

cypressf commented 2 years ago

In our meeting with Alex, he said it's difficult to debug neural nets, but he recommends first constraining the input space to just the region around the fixed point, and testing to see if we can get the NN to stabilize, as opposed to swing up. If it can't even stabilize, maybe there's a bug in the code.

He also said we could potentially check out the pytorch cart pole system. He said that system might "cheat" in some ways that drake doesn't.

We could also try adding more data, and more layers.