Closed raymondchua closed 5 years ago
Sorry for delay :).
The main difference is that ExperienceSource
produces all traces of given length, but ExperienceSourceFirstLast
returns only first and last states with calculated discounted reward between. It could be illustrated on example.
Suppose we have single episode with states 0 -> 1 -> 2 -> 3 -> 4. On the last state episode is terminated.
Suppose we have ExperienceSource(steps_count=3)
, then it will produce the following data on iteration:
But ExperienceSourceFirstLast(steps_count=3)
will return the following:
Reward returned by ExperienceSourceFirstLast
is aggregated using gamma passed on constructor.
Most of the time, ExperienceSourceFirstLast
is more convenient, as we're not normally need intermediate states. But sometimes, we need more control, so, ExperienceSource
could be handy.
In terms of implementation, ExperienceSourceFirstLast
is a wrapper around ExperienceSource
.
Can someone explain the main difference between ExperienceSourceFirstLast and ExperienceSource? Are we still storing every incoming state?