JuliaML / Reinforce.jl

Abstractions, algorithms, and utilities for reinforcement learning in Julia
Other
201 stars 35 forks source link

Inconsistency in State Type when developing custom Actions #14

Closed dmitrijsc closed 6 years ago

dmitrijsc commented 6 years ago

Hello guys,

I am experiencing the following issue when trying to run a gym. First state of an action is of type: Array{Int64,0} and the rest are Int64. Because of this I need to implement a workaround to keep the type consistent.

Simple example:

using OpenAIGym
import Reinforce.action

env = GymEnv("Taxi-v2")

struct NewRandomPolicy <: AbstractPolicy end

function action(policy::NewRandomPolicy, r, s, A′)
    println("Current state: $s, Type: $(typeof(s))")
    rand(A′)
end

reset!(env)
ep = Episode(env, NewRandomPolicy())

println(state(ep.env))

i = 0
for (s, a, r, sp) in ep
    i+=1
    if i > 3 break end
end

I get the following output:

72
Current state: 183, Type: Array{Int64,0}
Current state: 83, Type: Int64
Current state: 63, Type: Int64
Current state: 83, Type: Int64

I was also wondering why state(ep.env) is showing a different state form the one I get in my action (72 vs 183).

Thanks!

Evizero commented 6 years ago

I am sorry to say, but I don't think this package is currently actively maintained

dmitrijsc commented 6 years ago

Sad news :/ Thanks for input!

JobJob commented 6 years ago

Hi, I've been using this package, and OpenAIGym.jl in my research. I actually have some PRs to make to this and OpenAIGym.jl that I hope to get around to cleaning before the end of the year (minor fixes and additions).

Your issue is happening because of the environment itself. I looked at the python code and Taxi is a DiscreteEnv, and the reset function returns a 0-dimensional array for the state here (argmax returns an array), while the _step function returns an int

You can see the different types with the python code

env = gym.make("Taxi-v2")

s = env.reset()
print(type(s))
s,r,done,info = env.step(1)
print(type(s))

Outputs

<class 'numpy.int64'>
<class 'int'>

I'm not 100% sure if the conversion from numpy.int64 to a 0-dimensional array is intended/optimal, you could ask over at PyCall.jl if you like.

For now I think workaround is your best option. I would just define:

action(policy::NewRandomPolicy, r, s::Array{Int64, 0}, A′) = 
    action(policy, r, s[], A′)

I was also wondering why state(ep.env) is showing a different state form the one I get in my action (72 vs 183).

This is because the episode iterator, that calls the action function, calls reset!(env) here (effectively at the start of the for loop in your code).

dmitrijsc commented 6 years ago

@JobJob Thanks. That solves the issue.