ExperienceSourceFirstLast

Sorry for delay :).

The main difference is that ExperienceSource produces all traces of given length, but ExperienceSourceFirstLast returns only first and last states with calculated discounted reward between. It could be illustrated on example.

Suppose we have single episode with states 0 -> 1 -> 2 -> 3 -> 4. On the last state episode is terminated.

Suppose we have ExperienceSource(steps_count=3), then it will produce the following data on iteration:

[Experience(state=0), Experience(state=1), Experience(state=2)]
[Experience(state=1), Experience(state=2), Experience(state=3)]
[Experience(state=2), Experience(state=3), Experience(state=4)]
[Experience(state=3), Experience(state=4)]
[Experience(state=4)]

But ExperienceSourceFirstLast(steps_count=3) will return the following:

ExperienceFirstLast(state=0, last_state=2)
ExperienceFirstLast(state=1, last_state=3)
ExperienceFirstLast(state=2, last_state=None)
ExperienceFirstLast(state=3, last_state=None)

Reward returned by ExperienceSourceFirstLast is aggregated using gamma passed on constructor.

Most of the time, ExperienceSourceFirstLast is more convenient, as we're not normally need intermediate states. But sometimes, we need more control, so, ExperienceSource could be handy. In terms of implementation, ExperienceSourceFirstLast is a wrapper around ExperienceSource.

Shmuma / ptan

ExperienceSourceFirstLast #17