JuliaML / OpenAIGym.jl

OpenAI's Gym binding for Julia
Other
105 stars 19 forks source link

Update for 1.0, drop Universe, add some tests, and env.total_reward #10

Closed JobJob closed 5 years ago

JobJob commented 6 years ago

Start of the changes to this and Reinforce that @tejank10 and I discussed at JuliaCon.

Edit: this requires ~some changes to Reinforce - changed to WIP for now~ https://github.com/JuliaML/Reinforce.jl/pull/23

Coming soon: speedups - Gym environments in Julia as fast as from Python.

iblislin commented 6 years ago

(cc myself) I will find free time to review this.

JobJob commented 6 years ago

Ok thanks - I am just doing a minor cleanup of the speed PRs that follow this.

iblislin commented 5 years ago

I want to drop 0.6 support, just go toward v0.7/1.0. Update REQUIRE as well?

iblislin commented 5 years ago

Could you enable this feature, so I can push some minor changes directly? https://help.github.com/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork/

JobJob commented 5 years ago

Could you enable this feature, so I can push some minor changes directly? https://help.github.com/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork/

Yeah "Allow edits from maintainers." has always been checked

iblislin commented 5 years ago

TODO

iblislin commented 5 years ago

:zzz:

codecov-io commented 5 years ago

Codecov Report

:exclamation: No coverage uploaded for pull request base (master@09e1e31). Click here to learn what that means. The diff coverage is 55.55%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master      #10   +/-   ##
=========================================
  Coverage          ?   51.28%           
=========================================
  Files             ?        1           
  Lines             ?       39           
  Branches          ?        0           
=========================================
  Hits              ?       20           
  Misses            ?       19           
  Partials          ?        0
Impacted Files Coverage Δ
src/OpenAIGym.jl 51.28% <55.55%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 09e1e31...af6d63c. Read the comment docs.

iblislin commented 5 years ago

I read the code in Reinforce.jl again. Found that there is a field total_reward available in Reinforce.Epsode, so why we made similar stuffs in an environment?

JobJob commented 5 years ago

I don't really like the Episode abstraction, it was slow - that may be fixable, there were some type instabilities AFAIR, but I'm not sure it's really providing that much benefit. I'm open to being convinced though.

JobJob commented 5 years ago

Actually in general I'm of the view that this package should drop the dependency on Reinforce, and we create a separate package that wraps GymEnv as a Reinforce environment, by just defining the relevant Reinforce.xxxx methods, and forwarding them, something like

using Reinforce
using OpenAIGym: OpenAIGym, GymEnv

struct ReinforceGymEnv <: AbstractEnvironment
    gymenv::GymEnv
end
ReinforceGymEnv(args...; kwargs...) = ReinforceGymEnv(GymEnv(args...; kwargs...))
Reinforce.step!(env::ReinforceGymEnv, a) = OpenAIGym.step!(env.gymenv, a)
Reinforce.reset!(env::ReinforceGymEnv)   = OpenAIGym.reset!(env.gymenv)
...
JobJob commented 5 years ago

The other thing about Episode, and having things in iterators in general, is that they sometimes make error messages slightly more obscure. Things are more transparent, and less magical, if you just have a loop.

Anyway I'm not wedded to the idea of total_reward being in the env but it's convenient IMO, and low cost

iblislin commented 5 years ago

I don't really like the Episode abstraction, it was slow - that may be fixable, there were some type instabilities AFAIR, but I'm not sure it's really providing that much benefit. I'm open to being convinced though.

The other thing about Episode, and having things in iterators in general, is that they sometimes make error messages slightly more obscure. Things are more transparent, and less magical, if you just have a loop.

well, I want an abstraction. Acctually, I want an iteration protocol on episodic task. I know Reinforce.Episode is slow and type unstable, but that's old unmaintained code.

Maybe we can draw a new blueprint about episodic iteration protocol and a concrete implementation. I want this implementation be generic and versitile. Everyone just adds features into it and other package gain new features.

I read the code of Episode again, the current features is barren:

  1. it supports maxn to set the steps limitation.
  2. nothing :(

Actually in general I'm of the view that this package should drop the dependency on Reinforce, and we create a separate package that wraps this package as a Reinforce environment, by just forwarding the relevant methods.

About the package design, I think we can do this trade-off:

Let's put efforts on the primary goal first.

iblislin commented 5 years ago

Given that we clarified the primary goal, I can accept that you dropping the Reinforce stuffs at first, even creating a GymEnv not subtyping the AbstractEnvrionment.

JobJob commented 5 years ago

Agreed re primary goal, this package should provide a fast way to access Gym environments from Julia via Python.

There is some interesting work going on for all the other stuff at https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl I haven't taken it all in yet, but the design looks quite nice.

iblislin commented 5 years ago

so... do you want to make some rework on this PR, or I can merge it now?

JobJob commented 5 years ago

Merge it! We can do other stuff later. 🙏

iblislin commented 5 years ago

what :interrobang: I don't know this repo before... https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl

JobJob commented 5 years ago

Haha, yeah I had the same reaction when I first stumbled upon it a month or so ago