Kaixhin / rlenvs

Reinforcement learning environments for Torch7
MIT License
93 stars 23 forks source link

Twrl-rebased #14

Closed SeanNaren closed 7 years ago

SeanNaren commented 7 years ago

Based on #8, merged v1 as well into branch. Let me know if there are any remaining issues!

Kaixhin commented 7 years ago

Sorry am trying to finish off a paper due mid-NIPS before I head off to NIPS, so will have to put this on hold for a week. Have started taking a look but definitely need to have a thorough look through the whole thing before merging to master.

SeanNaren commented 7 years ago

@Kaixhin sounds good, enjoy NIPS :)

Kaixhin commented 7 years ago

Remind what the conclusion on timeStepLimit and maxSteps was? Currently the env will run to the minimum of these, rather than maxSteps overriding timeStepLimit.

Also. in gym, what's reward_threshold? Looking at MountainCar, it seems like a scaled version of rlenvs' rewardSpace, but only limited one way.

SeanNaren commented 7 years ago

Currently the default maximum time step per episode is 1000. This can be changed by the environment via timeStepLimit in the Env.lua class, and by maxSteps set by the user in the options. However if maxSteps is greater than timeStepLimit (and not null) it is set to timeStepLimit

Does that make sense? Any changes you would make?

Kaixhin commented 7 years ago

Seems a little counterintuitive to have timeStepLimit overwrite maxSteps, but if that's what gym does then let's stick to that for consistency. If so then go ahead and merge this pull request!

I see that reward_threshold is used to determine if an environment is "solved", but that's not of concern to this library immediately.

SeanNaren commented 7 years ago

I think as kory said this was so that each episode is a maximum fixed size for their leaderboards! And sounds good, will just do one quick pass through and merge :)

SeanNaren commented 7 years ago

Thanks for your help @Kaixhin, was great stuff!

Kaixhin commented 7 years ago

Thanks a lot @SeanNaren! I'll check if anything needs to be backported to v1 myself, so good luck with twrl integration.