crowdAI / marLo

Multi Agent Reinforcement Learning using MalmÖ
MIT License
244 stars 46 forks source link

what does the obs that is returned from step() function mean? #41

Closed ProQianXiao closed 6 years ago

ProQianXiao commented 6 years ago

I am trying the Single Agent Example from https://marlo.readthedocs.io/en/latest/usage/singleagent_example.html, I run the example and I print the obs that is returned from step function like this. while not done: _action = env.action_space.sample() obs, reward, done, info = env.step(_action) print(obs) Then I get the output of obs, it looks like ndarray,

3

But what does that mean? The pixel point information of next state? Or others? Could you help me to settle that?

spMohanty commented 6 years ago

@ProQianXiao : This refers to the RGB frame representing what you would see on the screen of the game if you were playing the game yourself. Does this answer your question ?

ProQianXiao commented 6 years ago

@spMohanty Thank you for your reply. Can I understand this? The obs that is returned from step function represents the state information, and I could use it as a parameter to pass to my algorithm. Is it right?

In addition, I also want to know how can we get the detailed information about the findThegoal environment, such as description, actions available, rewards, discrete action or continuous action? I search the envs, but there is no findThegoal environment. qq 20180828094747 Looking forward to your reply.

douglasrizzo commented 6 years ago

obs is an image, a screen capture of the game. Each of the three values in the lines from your terminal represent the R, G and B values of a pixel from the screen. If you do something like:

obs, reward, done, info = env.step(_action)    
im = PIL.Image.fromarray(obs)
im.save('image.png')

you'll see that a screen capture of the game will be saved as a PNG file. I hope that helps.

ProQianXiao commented 6 years ago

@douglasrizzo Thanks very much, that helps me a lot.

douglasrizzo commented 6 years ago

@ProQianXiao The FindTheGoal environment is named Basic in the screenshot you provided. However, I don't think you'll find any relevant info in the repository. You just have to interface with the game through the marlo package, send actions and receive rewards and new observations. I suggest you take a look at the usage examples. Personally, I don't think you'll have much structured data to work with (e.g. a discretized grid world). You'll just have to work with the image inputs.

spMohanty commented 6 years ago

Thanks @douglasrizzo for the clarification. And thats correct, in this competition, we are focussing on just image frames as observation.

At a later point in time, we do have plans to make the grid word available as observation (and make it configurable), but that will anyway not be allowed in this competition.

ProQianXiao commented 6 years ago

@douglasrizzo @spMohanty that is very kind of you. I figure out what the obs means and the how to use state information now. Thanks very much again.