Kaixhin / rlenvs

Reinforcement learning environments for Torch7
MIT License
93 stars 23 forks source link

Modify envs to be compatible with twrl #8

Closed SeanNaren closed 7 years ago

SeanNaren commented 8 years ago

Just opening this PR to track progress based on this PR. Need to test each environment with twrl to make sure they behave correctly, if you spot any issues or have any feedback let me know :)

It may also be nice to set the rewards similar to gym as well for comparison sake!

Kaixhin commented 8 years ago

Looks good! Might as well copy as much of the API as we can. Could rename actionSpec to actionSpace? Think that returning it from a function call is a bit cleaner than having it as a property, but we could change that if you feel that it's better. actionTaken from env:step(action) should definitely be replaced with the optional info, and we can leave it up to people how they want to add information to the info table. A render would be nice, but beyond the scope of this PR.

Otherwise agreed with making the reward structure the same as gym, provided the environment dynamics are the same. Anyone who was using rlenvs seriously for benchmarking can simply use v1.

SeanNaren commented 8 years ago

Could you point me to where actionTaken is? Can't seem to find it. I've updated the names of the function, hopefully that's what you had in mind!

I'll get to work on the reward schemes for things that cross over to gym!

Kaixhin commented 8 years ago

Ah sorry that is only mentioned in the docs, but isn't implemented in any of the envs provided - the context can be found in https://github.com/Kaixhin/Atari/pull/56. I've notified the author of the PR that we're moving towards a more gym-compatible API, so to put that on hold for now.

Sorry but do you mind doing stateSpace to getStateSpace, etc.? Since it's implemented as a function in rlenvs, I think the getter/setter nomenclature makes it a little clearer.

SeanNaren commented 7 years ago

I tried to change rewards for the CartPole environment to match the reward scheme for gym but ran into issues.

For gym there is +1 for the time the pole is up, 0 for the time the pole is down. Just a sanity check should this then be reward = 1 and this be reward = 0? Getting strange behaviour when replacing this environment with the gym one in torch-twrl!

Once I resolved why this is the case I want to add an additional function to the init that will list all environments that are possible, if that sounds good!

EDIT: I think I found the reason for the inconsistencies, gym has max steps for each of their environment (terminate after those amount of steps to prevent it going forever). Should we implement this into rlenvs?

Kaixhin commented 7 years ago

Yep sanity check looks correct to me - let's go ahead with changing the reward to match.

I see you've already implemented the list - looks good. Remember to document in the README as well.

Is the max steps setting for all environments (including Atari)? If it's only there for the simple envs then it would be a good thing to add it into opts for these environments, with the gym defaults (if any).

FYI I'm going to add the environment from Towards Deep Symbolic Reinforcement Learning into the v1 branch in about 3 weeks. If you could merge and update it into twrl that would be appreciated :)

SeanNaren commented 7 years ago

Sounds good, I'll work towards that. So for the max step thing, here is how it's done for a few environments in openai-gym. How about I expose the step method in the Env class, and reference a variable called maxSteps that are set per environment. Using this variable I return termination in the Env class? Something like this:

local classic = require 'classic'

local Env = classic.class('Env')

-- Denote interfaces
Env:mustHave('start')
Env:mustHave('act') -- change name here to prevent overload
Env:mustHave('getStateSpace')
Env:mustHave('getActionSpace')
Env:mustHave('getRewardSpace')

function Env:step(action) 
    local reward, state, terminal = self:act(action)
    self.currentStep = self.currentStep == nil and 0 or self.currentStep
    if self.maxSteps and self.currentStep % self.maxSteps == 0 then -- env defines maxSteps
        terminal = true
        self.currentStep = 0
    end
    self.currentStep = self.currentStep + 1
end

return Env
Kaixhin commented 7 years ago

Overall the solution is good, but a few suggestions. I'd go with Env:mustHave('_step') to make it pretty clear that the internal method is hidden/allow people to use act for whatever reason. Secondly, if you haven't already, it's worth checking exactly how gym defines the max amount of timesteps - does it include the state from starting the environment? Finally, I'm not sure why you went for self.currentStep % self.maxSteps == 0 instead of just self.currentStep == self.maxSteps?

SeanNaren commented 7 years ago

Not entirely sure tbh, had a brain fart :P Made the changes and will now go through and add default steps as done in openai-gym (I've set the default to 1000, which is the default for openAI-gym, but if it's an issue lemme know!).

Kaixhin commented 7 years ago

I think the cleanest way to do this is to have Env.lua actually implement _init with self.maxSteps = opts.maxSteps or 1000, and have every env first call super._init(self, opts). This is where it should go, and it'll stop recalculating it on every step. Also, if I'm correct, this is going to run into problems unless you reset self.currentStep in start as well. Which probably means handling the logic in _init, start and step and using super calls.

After super._init(self, opts) I think it's worth matching the maxSteps to gym if it actually differs from 1000 according to this.

SeanNaren commented 7 years ago

I've made a few changes, added an init function in the Env.lua class as well as added the super.init call to all the envs. I've added default max steps and some logic in the Env.lua class to handle different step cases, lemme know of any feedback!

Also for rendering, shall we make it possible to render via the Env.lua class by passing in a render boolean in the opts? I can get to work on that if you think it would be good!

Kaixhin commented 7 years ago

@korymath Rendering would be nice - any suggestions for how best to integrate it with gym/twrl?

korymath commented 7 years ago

A boolean in the opts would be a good means by which to pass the option, but is perhaps not the full solution.

My inclination is that this would allow for a strictly Lua RL infrastructure, free of connection to gym, and thus the rendering could be completely separate from that rendering. Having not done much graphics work in Lua, I am not sure how to best make this happen.

One potential means would be to save the transitions, and then format them in such a way that they can be loaded and replayed in the gym infra (that is, if the environments are exactly the same as those in gym, but that may be limiting, and require offline processing in between learning and display).

I would argue, that, while rendering is nice, it is not critical to learning (unless there is a learning signal derived from pixels, or a human observer of the visualization, or maybe a third case?). Thus, even a basic render of the learning performance could be generated every N steps, much like how the OpenAI gym allows for a video to be rendered.

Nice works on the max steps discussion, that is an interesting quirk of the OpenAI gym code, that i think is based on the online leaderboard, and the desire for shorter episode lengths for direct comparison.

SeanNaren commented 7 years ago

I added a :render() method which will check to make sure that a :getDisplay() method exists to retrieve a displayable screen and render it using qt (check the Env.lua class). I've also updated the experiment.lua class, check it out!

korymath commented 7 years ago

This looks great!

Thoughts on how to handle execution on a server? Maybe similar to gym and allow creation of a 'fake' display for output?

On 24 October 2016 at 11:14, Sean Naren notifications@github.com wrote:

I added a :render() method which will check to make sure that a :getDisplay() method exists to retrieve a displayable screen (check the Env.lua class). I've also updated the experiment.lua class, check it out!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Kaixhin/rlenvs/pull/8#issuecomment-255804029, or mute the thread https://github.com/notifications/unsubscribe-auth/AAK3s7JfHSknlRmkY3I3CCXuN3Qhz5SXks5q3OdkgaJpZM4KLzuC .


Kory Mathewson

SeanNaren commented 7 years ago

Thanks @korymath! That's an interesting idea, maybe similar to how the gym API is currently interfaced with? If this is deemed necessary I could start thinking how to implement this (probably out the scope of this PR)

korymath commented 7 years ago

Sounds great!

Agree it is probably another large feature build.

On 25 October 2016 at 12:49, Sean Naren notifications@github.com wrote:

Thanks @korymath https://github.com/korymath! That's an interesting idea, maybe similar to how the gym API is currently interfaced with? If this is deemed necessary I could start thinking how to implement this (probably out the scope of this PR)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Kaixhin/rlenvs/pull/8#issuecomment-256137117, or mute the thread https://github.com/notifications/unsubscribe-auth/AAK3s7h1dASukqgzOELIIlAC_g95a2raks5q3k8ygaJpZM4KLzuC .


Kory Mathewson

Kaixhin commented 7 years ago

Sorry, small piece of extra work! Just managed to add in "XOWorld", which is the little domain featured in "Towards Deep Symbolic Reinforcement Learning", in the v1 branch as well as master - haven't added it to twrl though.

SeanNaren commented 7 years ago

Seems like the merge of the branch replayed the commits on top... once I've finished with XOWorld I think I'll rebase the branch and open a new PR, is that alright?

Also there is a small issue here, comma missing at the end :)

EDIT: XOWorld is an awesome environment!! Should be fully implemented now and should wrap up everything. Just thinking about unit-tests over here to try make sure everything works correctly now!

Kaixhin commented 7 years ago

Thanks for spotting that - now fixed on v1 and merged into master.

Yes a rebase would clean this up, good job for suggesting :) There shouldn't be any more changes due now.

SeanNaren commented 7 years ago

Thanks, in terms of unit tests any suggestions on functionality to verify and check?

Kaixhin commented 7 years ago

Good question. mustHave already helps a little bit. Seeing what I just did, maybe static syntax checking would be useful? Going further, argcheck might be worth adding in.

I suppose you could test an environment by loading it, starting it, and doing at least one step. Not sure if there's anything that needs to be done if someone tries to step before start. Checking maxSteps works really depends on the environment (e.g. Cartpole will fail quickly with a random agent), unless you manually set it to a very low number.

I started thinking about testing Atari, but there's licensing issues with ROMs. I thought maybe there might be a homebrew ROM that we could include in the repo, but then it'd need to be coded into ALE...

korymath commented 7 years ago

Feel free to use torch-twrl as a testing base if you want. It has some basic testing functionality built up here: https://github.com/twitter/torch-twrl/blob/master/test/test-gym.lua

It runs a few tests for different environments, the runTest code is from the gym-http-api, you can see the full extent of it here: https://github.com/openai/gym-http-api/blob/master/binding-lua/test_api.lua It is a pretty verbose test that runs through the RL environment functionality.

It is quite easy to find the ROMs if you are particularly motivated.

SeanNaren commented 7 years ago

Thanks for that, I can definitely try imitate those tests for all the rlenvs environments!

EDIT: Also saw malmo added (which is awesomesauce!) I'll pull this in and get this implemented as well. EDIT2: I will wait!

Kaixhin commented 7 years ago

@SeanNaren I'm trying to get Malmo working myself, so bear with me a while. New plan is to target v1, then merge into master (they're now the same, even though the v2 rockspec shouldn't be in v1, but oh well). I'll aim to get it done by the end of today, but will report back once I've got it working successfully.

Kaixhin commented 7 years ago

OK that was a bit of a mess but I seem to have knocked down the requirements nicely and condensed the code. Had a quick look, but couldn't get qlua working - the screen does however display in the running client. Also couldn't get it working with Atari because of some issue with luaposix, which is a shame, but I won't have time to investigate further.

SeanNaren commented 7 years ago

Right I think that's minecraft handled (I need to still test that it works, will do once I'm home on my own machine) and now just the tests.

Currently I just run through every environment but need to add some assertions here, however not sure what I could assert on. In the torch-twrl tests we assert on a successful run through (return true as long as everything runs correctly).

korymath commented 7 years ago

Great work!

You can do simple assertions on type of variables response, or an assertion range.

ex:

runTest could return an error code with a pcall, if the return is nice then assert on that.

On 17 November 2016 at 03:30, Sean Naren notifications@github.com wrote:

Right I think that's minecraft handled (I need to still test that it works, will do once I'm home on my own machine) and now just the tests.

Currently I just run through every environment but need to add some assertions here https://github.com/Kaixhin/rlenvs/blob/twrl/tests/test.lua, however not sure what I could assert on. In the torch-twrl tests we assert on a successful run through (return true as long as everything runs correctly).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Kaixhin/rlenvs/pull/8#issuecomment-261211646, or mute the thread https://github.com/notifications/unsubscribe-auth/AAK3syevaHnwcMmWAfxEET18at7EHqVIks5q_CzBgaJpZM4KLzuC .


Kory Mathewson

SeanNaren commented 7 years ago

Added assertions, check them out! I also rebased the branch into one nice commit, check here. I think the changes are done here so it be great if you could have a quick look to let me know if anything needs changing!

What do we plan to merge into? I can then open the PR on the rebased branch.

Kaixhin commented 7 years ago

Unfortunately I'll be away this weekend so can't look at it till next week, but the plan is to merge this into master and update the readme to show installing the v2 rockspec. It's a bit messy, but then we can look at the v1 branch and make sure that is fine (e.g. by deleting the v2 rockspec inside that branch, but hopefully that's it).

This way anyone who's still using the old API (e.g. the Atari repo) can still do so by requiring the v1 rockspec. :) No need to backport new environments unless we're requested to. twrl will of course use v2.

SeanNaren commented 7 years ago

Sounds like a plan and no worries, whenever you get time!

Kaixhin commented 7 years ago

Looks good! Firstly, can you merge the latest Minecraft stuff (both in the env and the readme)? Secondly, if you could use 2-space indentation for consistency that'd be awesome. I'll have another look after and then we should be able to merge twrl-rebase into master.

SeanNaren commented 7 years ago

Having issues merging master into the rebased branch. Can you see any commits from master not in the rebased branch?

Kaixhin commented 7 years ago

Ah sorry the latest work should be in v1, which should be merged into twrl-rebase, which should then eventually be merged into master.

SeanNaren commented 7 years ago

This is continued in #14