Shmuma / ptan

PyTorch Agent Net: reinforcement learning toolkit for pytorch
MIT License
530 stars 164 forks source link

ExperienceSourceFirstLast #17

Closed raymondchua closed 5 years ago

raymondchua commented 5 years ago

Can someone explain the main difference between ExperienceSourceFirstLast and ExperienceSource? Are we still storing every incoming state?

Shmuma commented 5 years ago

Sorry for delay :).

The main difference is that ExperienceSource produces all traces of given length, but ExperienceSourceFirstLast returns only first and last states with calculated discounted reward between. It could be illustrated on example.

Suppose we have single episode with states 0 -> 1 -> 2 -> 3 -> 4. On the last state episode is terminated.

Suppose we have ExperienceSource(steps_count=3), then it will produce the following data on iteration:

But ExperienceSourceFirstLast(steps_count=3) will return the following:

Reward returned by ExperienceSourceFirstLast is aggregated using gamma passed on constructor.

Most of the time, ExperienceSourceFirstLast is more convenient, as we're not normally need intermediate states. But sometimes, we need more control, so, ExperienceSource could be handy. In terms of implementation, ExperienceSourceFirstLast is a wrapper around ExperienceSource.