observe(env) - Githubissues

jbrea commented 4 years ago

Should we add observe(env) as an optional interface function? See this explanation by @findmyway why this may be useful in the multi-agent setting. Or should we even make this mandatory and have reset! and step! return nothing?

zsunberg commented 4 years ago

I think it would be reasonable to add observe. We may also want to add interact!(env, a) or maybe just act!(env, a) that does not return anything.

However, I think we should keep step! returning the standard four things that people usually expect.

I didn't fully understand why interact!/observe is better for the asynchronous case. Can't you just do the following?

result = @async step!(env, a)
o, r, done, info = fetch(result)

findmyway commented 4 years ago

The problem is in multi-agent environments. Let's take the tic-tac-toe environment for example:

# given policy_X, policy_O, env

obs_0, reward_0, done_0, info_0 = reset!(env)
action_x = policy_x(o)
obs_1, reward_1, done_1, info_1 = step!(env, action_x)

OK, now the question is, what's obs_1? Is it the observation from player X's perspective? Then how do player O get its observation (obviously it's not obs_0)? So one solution I've seen is

The result of step!(env, action) contains info of all players, and in each ply every player gets the necessary info from it.

Like I said in the original answer, the problem of step! is it combines two steps in one. And I personally think it's more intuitive to do env(action) and then observe(env). And we really don't need to care if the env is sync or async.

Anyway, my point is, we don't have any technical debt now and it's a good chance to review some existing conventions. 😃

darsnack commented 4 years ago

In the tic-tac-toe example, the observation of the board at the end of each play is independent of player perspective. Every agent sees the observation in the same way, but the issue is that an observation is needs to be the board after all agents have interacted with the environment. That's a blocking operation, and it doesn't matter if step! blocks or if interact! doesn't block but observe does.

But all those details are beyond the scope of an interface. I think the right way to handle it is with an environment that handles blocking etc. correctly for multi-agent use, but those are implementation details.

That being said, I am in favor of interact! + observe, because

step! is easily defined by calling those two in sequence
step! forces a restriction on environment writers whereas decoupling the two parts of step! does not Ultimately, (2) is more flexible, and it is better for the interface to be flexible.

darsnack commented 4 years ago

Okay but another example is a 3D environment with multiple agents. I don't think step! vs interact!/observe is the key issue in this case. Here, each agent does see a different perspective (e.g. visual stimulus). Is observe(env) a sufficient interface? The crux is what does the returned observation mean? Intuitively, it should be the perspective of the agent that called observe, but how does the environment know that?

One solution is for the environment to be "distributed." A single 3D environment chopped up into subperspectives. The env that each agent is using in code is a subperspective. But again, these are implementation details. I don't feel that the interface can strongly influence the correct asynchronous design.

zsunberg commented 4 years ago

The env that each agent is using in code is a subperspective.

I agree. I think the best way to create a multiplayer environment is to create multiple connected environments - each one looks like a single-player environment to that agent.

@findmyway are you advocating replacing step! with interact!/observe? Or just adding them as optional interface functions?

zsunberg commented 4 years ago

What is the behavior of observe if it is called multiple times between calls to interact!?

findmyway commented 4 years ago

What is the behavior of observe if it is called multiple times between calls to interact!?

It depends. In most common cases, they are exactly the same. But in client-server style environments, the observations might differ as time goes by. In multiagent environments, it depends on if other agents already interact! with the environment.

@findmyway are you advocating replacing step! with interact!/observe? Or just adding them as optional interface functions?

I'd prefer to replace step! with observe.

Is observe(env) a sufficient interface? The crux is what does the returned observation mean? Intuitively, it should be the perspective of the agent that called observe, but how does the environment know that?

For multi-agent environments, we need to expand observe(env) a little to allow observe(env, player). Then observe(env) in multi-agent environment means the observation from an bird's-eye-view (may be useful in imperfect-information environments?).

zsunberg commented 4 years ago

If observe gets called twice between calls to interact! and both times the reward returned by the observe call is 1.0, has the agent accrued 1.0 reward or 2.0?

I would prefer to stick with step! for the required interface since it is the de facto standard, and I think it is best to have zero barriers to immediately understanding the package. In the basic RL case, where the world is abstracted as a (PO)MDP, I don't think step! is really conflating two things. For every step, you take an action and get an observation and reward; you cannot choose whether to or when to observe.

Side note: regarding "it's a good chance to review some existing conventions." Yeah, I think there is a lot of room for improvement if we don't stick to conventions. (E.g. personally, I think a really Julian thing would be to not require the environment to be mutable, so you would use step instead of step! - this might help with things like differentiability/ putting it on specialized hardware, etc.) However, I think the required interface of this package is not the best place for that. The required interface should be the conservative base that we all build on, since we don't even have that yet :smile:

If we want to consider replacing step! in the required interface, I suggest we label this as a decision thread. I think also discussing whether and how we want to explicitly support multi-agent environments should be it's own thread.

zsunberg commented 4 years ago

@rejuvyesh, what are your thoughts? You have up/down-voted a few comments, but it's hard to tell which parts of the comments you are reacting to :smile:

zsunberg commented 4 years ago

Also @maximebouton, @mkschleg, if you have any comments on step! vs interact!/observe, they would be much appreciated.

An advantage of observe, whether it is optional or required, is that it might allow for observation configuration in the future.

jbrea commented 4 years ago

If step! is just interact! with returns, i.e. step!(env, a) = begin interact!(env, a); observe(env) end, I would go with step! mandatory and observe optional. Like this we get the de facto standard for many cases (step!) and the flexibility for multi-agent settings (step! without using the returns and observe whenever needed). Wouldn't this work @findmyway ?

findmyway commented 4 years ago

Yes, I think it will work.

rejuvyesh commented 4 years ago

@zsunberg I was disagreeing with this:

I think the best way to create a multiplayer environment is to create multiple connected environments - each one looks like a single-player environment to that agent.

There are many factors in a multi-agent environment, but it's not as simple as connecting multiple single-agent environments. This idea of having a separate observe is also pretty useful if we are interested in modeling more real-time systems where the environment has its own clock and the agents can observe and interact however they see fit.

MaximeBouton commented 4 years ago

I think step! should be the "standard" way of implementing an environment, and should be part of the required interface. It seems like the observe, interact! workflow serve more specific use cases, and most tinkerers will not use it.

However I think observe is an important concept and allows the following things:

retrieve only part of the information given by step! (might be cheaper)
the multiagent use case of having the possibility to get the obervation for one specific agent with observe(env, player) as mentioned before, I think we need the two methods (with observe(env))

The difference between observe and interact! is a bit unclear to me though, observe seems like a specific case of interaction with the environment and I am not sure why we need both.

zsunberg commented 4 years ago

(step! without using the returns and observe whenever needed).

@jbrea, @findmyway I'm not sure this will work that well. step! will block until it returns. I think we would want interact! (possibly named act!) as well.

The difference between observe and interact! is a bit unclear to me though.

interact! is just step without returning.

What will observe return? In particular, will it return the reward?

rejuvyesh commented 4 years ago

I believe, observe only returns the current observations for the agent. reward is returned only when the agent interacts.

findmyway commented 4 years ago

I believe, observe only returns the current observations for the agent. reward is returned only when the agent interacts.

This makes morse sense to me.

zsunberg commented 4 years ago

I think some concrete examples might really help this discussion. We have touched on the multi-agent case and the real-time case, but it would probably be easier to reason about them if we wrote out some code snippets or pointed to existing code (I made the usage-example tag for this purpose. We also may want to wait to design features like this until someone is actively working on a problem that requires it (Perhaps @findmyway is already :smile: ).

darsnack commented 4 years ago

I want to cc @pmm09c into this discussion. Peter and his colleague approached me about working on scalable multi-agent RL, and they have already written some papers on the topic. Perhaps they have some input/usage examples that are relevant?

pmm09c commented 4 years ago

@darsnack thanks, @rejuvyesh is actually who we're working with along with a few other folks so happy to see him here.

@rejuvyesh , for context we're super focused on supporting distributed compute for handling complex/slow multi-agent envs on the grid. It seemed like it made a lot of sense to use Julia for this, so I started working on a reverb like buffering system to help facilitate that.

Anyway, our use case might be a bit niche, but I'm glad to track everything going on here so I can try my best to keep things compatible.

JuliaReinforcementLearning / CommonRLInterface.jl

observe(env) #12