Open sglyon opened 8 years ago
We should also have a reset!(::AbstractEnvironment)
method.
Also we should discuss "mandatory" fields for an environment.
Do you guys know about https://github.com/JuliaPOMDP/POMDPs.jl ? POMDPs are a pretty related problem to RL. could be some synergy there
Thanks... I didn't know about that org. Certainly related... I'll have to review it.
On Monday, August 8, 2016, Christof Stocker notifications@github.com wrote:
Do you guys know about https://github.com/JuliaPOMDP/POMDPs.jl ? POMDPs are a pretty related problem to RL. could be some synergy there
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tbreloff/Reinforce.jl/issues/2#issuecomment-238164425, or mute the thread https://github.com/notifications/unsubscribe-auth/AA492oJdKCUsrw-6tbPVUJJFu8BO0tANks5qdt8VgaJpZM4Jdcsq .
After reviewing what's in JuliaPOMDP, I'd like to move forward with what we've already discussed, and possibly write some (conditionally included) code to link our abstractions to the JuliaPOMDP world. They have large web of interdependencies, most of which are not registered packages, and their own package manager to handle it. I'm weary of depending on that, and I'd prefer to make our own path.
In addition, they have much more of a focus (seemingly) on finite, table-based, offline solvers, and I don't know if there's really that much overlap beyond some of the verbs and test environments.
tl;dr Someday we should revisit whether we can merge, but for now I think it's easier to stay arms-length.
@spencerlyon2 I'm thinking about how/where we could convey that there are no more steps to take (i.e. the episode is done). I wonder if we can fit environments into the iteration pattern, or maybe that an "episode" is a iterator/wrapper around an environment and policy:
immutable Episode{E,P}
env::E
policy::P
total_reward::Float64
niter::Int
end
# todo: start, next, done definitions
for (s,a,r) in Episode(env, policy)
# do something, or not?
end
Also thinking we need the state as input to step, and that it should be mutating:
r, s′ = step!(env, s, a)
I'd like for a "sarsa" update to make sense:
# assume we have (s,a) for this iteration already (it's last iteration's a′)
r, s′ = step!(env, s, a)
done(env) && break # ??
a′ = action(policy, r, s′, actions(env, s′)
step(env, s, a)
sounds good to me.
What would mutate in a call to step!
?
I just wanted to say that I am hugely excited about the prospect of machine learning API with different backends similar to the absolutely amazing Plots.jl. I am not good enough to contribute substantially to this except from a users point of view, but hugs to you all for this :-)
Presumably the call to step!
is expected to mutate the environment in some way. There may be some cases where that's not true, but it's probably the exception.
I just wanted to say that I am hugely excited about the prospect of machine learning API with different backends similar to the absolutely amazing Plots.jl.
Thanks... yeah I hope all works out well.
I am not good enough to contribute substantially to this except from a users point of view
If you have time to contribute, we could use help with QA... either writing tests, or just using and reporting back on problems/suggestions!
I made many of the changes we discussed, and wrote out a bare minimum readme. Please review when you have time @spencerlyon2. No hurry.
FWIW I think the current API is very easy to understand, and I think it will be quite easy for users to use it for their own extensions. One thing I am not quite sure of is that of check_constraints
at https://github.com/tbreloff/Reinforce.jl/blob/a9218995afb699846f48dcf06695d0156278f47e/src/episodes.jl#L31. Maybe I'm just not used to this, but what if the constraints are not satisfied? Shouldn't a new action be chosen? Or am I misunderstanding the function names here?
@pkofod I agree. maybe this line could be a = check_constraints(...)
instead??
Yes, something like that. Would check_constraints then potentially be a recursion, or? I mean a new a
could violate the constraints as well, and then check_constraints
would have to be called in check_constraints
, or?
Edit: alternatively something like
function Base.next(ep::Episode, i)
env = ep.env
s = state(env)
passed = false
while !passed
a = action(ep.policy, reward(env), s, actions(env))
passed = check_constraints(env, s, a)
end
r, s′ = step!(env, s, a)
ep.total_reward += r
ep.niter = i + 1
(s, a, r, s′), i+1
end
I'd be ok throwing an error too. In theory the agent should be choosing something from the valid set of actions.
On Thursday, September 15, 2016, Patrick Kofod Mogensen < notifications@github.com> wrote:
Yes, something like that. Would check_constraints then potentially be a recursion, or? I mean a new a could violate the constraints as well, and then check_constraints would have to be called in check_constraints, or?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tbreloff/Reinforce.jl/issues/2#issuecomment-247470502, or mute the thread https://github.com/notifications/unsubscribe-auth/AA492jHy9hqyMfLJJ7kvMSxeSPqHtwFrks5qqcNlgaJpZM4Jdcsq .
When I wrote that bit I was thinking that we we should throw an error.
Really I want a way to communicate that the action space maybe current state dependent.
The agent is given a valid action set in the line before, so I think it's fine to throw an error.
On Thursday, September 15, 2016, Spencer Lyon notifications@github.com wrote:
When I wrote that bit I was thinking that we we should throw an error.
Really I want a way to communicate that the action space maybe current state dependent.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tbreloff/Reinforce.jl/issues/2#issuecomment-247506201, or mute the thread https://github.com/notifications/unsubscribe-auth/AA492ovMetfjhdNNwhZCRgLhgFbQFfEpks5qqgLxgaJpZM4Jdcsq .
Of course. My question is then, is it really necessary? Is it just "to be sure"? I mean, it will only error if the actions
method for the subtype of AbstractPolicy
is incorrectly implemented.
@tbreloff I'm probably missing something obvious here, but In that routine I don't see where we get an action set based on the current state.
What I have in mind is that each period s \in \mathcal{S}
and the agent must pick a = A(s) \subseteq \mathcal{A}
(a la Barto & Sutton).
I think that a better way to do this would be to define actions(::Env, ::State)
so that the action set passed to the actor is always valid given the current state. Then there would be no need for the check_constraints
routine.
Thoughts?
Makes sense to have env in actions to constrain the actions before picking one .
In my mind that's already true because I assume the environment knows the state, so actions(env) can be based on the state if appropriate. But I agree it would be more general (and allow for dispatch) if the call was 'A = actions(env, state)' instead.
The check_constraints could just '@assert a in A'
Also I can't remember if I switched it yet but I added "math sets" to LearnBase and we should be returning those from 'actions'.
On Friday, September 16, 2016, Spencer Lyon notifications@github.com wrote:
@tbreloff https://github.com/tbreloff I'm probably missing something obvious here, but In that routine I don't see where we get an action set based on the current state.
What I have in mind is that each period s \in \mathcal{S} and the agent must pick a = A(s) \subseteq \mathcal{A} (a la Barto & Sutton).
I think that a better way to do this would be to define actions(::Env, ::State) so that the action set passed to the actor is always valid given the current state. Then there would be no need for the check_constraints routine.
Thoughts?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tbreloff/Reinforce.jl/issues/2#issuecomment-247563160, or mute the thread https://github.com/notifications/unsubscribe-auth/AA492pwwvLAXc912GVPmfbDKoUyNqViWks5qqmhTgaJpZM4Jdcsq .
Ok I didn't understand that. We should make a note somewhere so that is very clear to users (and devs).
I can't recall, do we already have a way to describe the action space for the env (the \mathcal{A}
I talked about earlier)?
Note that the action space is different from the set of appropriate actions, given the state. Sorry for the double message, I just didn't think I made that clear
I don't remember if we distinguish actions from action_space, but that certainly makes sense.
On Friday, September 16, 2016, Spencer Lyon notifications@github.com wrote:
Note that the action space is different from the set of appropriate actions, given the state. Sorry for the double message, I just didn't think I made that clear
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tbreloff/Reinforce.jl/issues/2#issuecomment-247573473, or mute the thread https://github.com/notifications/unsubscribe-auth/AA492vGQA0sh0Eav99ccllRQQBS5KxScks5qqnZPgaJpZM4Jdcsq .
I'm going to make the following API changes:
actions(env) --> A
done(env) --> bool
# becomes:
actions(env, s) --> A
finished(env, s′) --> bool
and I'll change check_constraints into an assert.
ref: f21b5cf
I want to link the solvers to the work I'm doing in StochasticOptimization, so there's still a lot of changes to be made here, but hopefully one could develop the environments without too much changing.
After a chat with @tbreloff, we roughly decided on the following core API methods:
actions(::AbstractEnvironment, s::State) -> A(s)
: This would return the setA(s)
that contains all valid actions from the states
step(::AbstractEnvironment, a::Action) -> s', r
: This method returns the next states'
and the rewardr
associated with taking actiona
. Note that the current states
is a field of theAbstractEnvironment
. This method is also responsible for setting this state field to bes'
action(::AbstractPolicy, s'::State, r::Reward, A(s')::Actions) -> a'
: This method should be implemented by each subtype ofAbstractPolicy
and should observe a state transition (result of the callingstep
method with the previous actiona
) and output the next actiona'
.@tbreloff also mentioned an
episode(::AbstractEnvironment)
method, but I wasn't clear on its purpose so I'll let him fill in the blanks there.