Closed SeanNaren closed 7 years ago
Sorry am trying to finish off a paper due mid-NIPS before I head off to NIPS, so will have to put this on hold for a week. Have started taking a look but definitely need to have a thorough look through the whole thing before merging to master
.
@Kaixhin sounds good, enjoy NIPS :)
Remind what the conclusion on timeStepLimit
and maxSteps
was? Currently the env will run to the minimum of these, rather than maxSteps
overriding timeStepLimit
.
Also. in gym
, what's reward_threshold
? Looking at MountainCar, it seems like a scaled version of rlenvs
' rewardSpace
, but only limited one way.
Currently the default maximum time step per episode is 1000. This can be changed by the environment via timeStepLimit
in the Env.lua
class, and by maxSteps
set by the user in the options. However if maxSteps
is greater than timeStepLimit
(and not null) it is set to timeStepLimit
Does that make sense? Any changes you would make?
Seems a little counterintuitive to have timeStepLimit
overwrite maxSteps
, but if that's what gym
does then let's stick to that for consistency. If so then go ahead and merge this pull request!
I see that reward_threshold
is used to determine if an environment is "solved", but that's not of concern to this library immediately.
I think as kory said this was so that each episode is a maximum fixed size for their leaderboards! And sounds good, will just do one quick pass through and merge :)
Thanks for your help @Kaixhin, was great stuff!
Thanks a lot @SeanNaren! I'll check if anything needs to be backported to v1
myself, so good luck with twrl
integration.
Based on #8, merged
v1
as well into branch. Let me know if there are any remaining issues!