I am implementing a lot of reinforcement learning and imitation learning algorithms since I'm sick of reading about them but not really understanding them.
It turns out that the G-learning paper doesn't use the episodic setting (at least for the cliff-world setting, which is my main concern). Let's write a new cliff-world environment which isn't episodic and see if this matches their results.
It turns out that the G-learning paper doesn't use the episodic setting (at least for the cliff-world setting, which is my main concern). Let's write a new cliff-world environment which isn't episodic and see if this matches their results.