google-deepmind / meltingpot

A suite of test scenarios for multi-agent reinforcement learning.
Apache License 2.0
583 stars 118 forks source link

Increase efficiency by better encoding of observation #67

Closed kinalmehta closed 2 years ago

kinalmehta commented 2 years ago

Hi,

As mentioned in #11, I tried "LAYER" based observations.

Following are the observations for prisoners_dilemma_in_the_matrix substrate for each agent:

But there is no documentation on how to interpret this LAYER observation. Could you please direct me to where I could find help interpreting LAYER?

I also tried changing the spriteSize parameter from 8 to 1. This helps reduce the observation space from (88,88,3) to (11,11,3), and I believe this could lead to speed-up and ease of learning in feature learning. Is there any side-effect I should be careful about changing spriteSize?

Any other suggestions from the authors to speed up the environment interaction and training loop to enable faster experimentation?

Thank You, Kinal Mehta

jagapiou commented 2 years ago

@jzleibo will understand this better than me, but here's my understanding.

  1. The layer is per grid element (11x11 grid, 17 layers). Sorry, I don't know what each layer represents nor what each individual entry represents. We usually use the raw RGB.
  2. Reducing the sprite size of the substrate like that will make each sprite a single pixel: I strongly doubt that's desirable as I'm not sure things in our substrates are discriminable from each other by a single RGB value alone.

Note that the scenarios only provide RGB observations (no LAYER). So you will need to train on the RGB if you want to evaluate on the test scenarios. Note that the first convnet we use (padding VALID, 8x8, stride 8, 16 channels) reduces each frame down to 11x11x16 activations: i.e. 16 kernels all centered on each 8x8 pixel grid square. So I expect using LAYERS directly won't be much of a speed/memory saving.

kinalmehta commented 2 years ago

Thanks for your response @jagapiou.

  1. Reducing the sprite size of the substrate like that will make each sprite a single pixel: I strongly doubt that's desirable as I'm not sure things in our substrates are discriminable from each other by a single RGB value alone.

As all the substrates are grid-world environments, is there any other form of encoding I could access? That would help the algorithm focus on learning the RL task. As with RGB observation, the NN model will need to learn the feature representation from RGB first and then use it to learn the RL task.

jzleibo commented 2 years ago

I think there's a reasonable chance that RGB with a sprite size of 1 could work reasonably well for many of the substrates. You would have to also change your conv nets in the analogous way of course. We have never tried this ourselves so it's terra incognita. I'm curious to know the results.

kinalmehta commented 2 years ago

Thanks for the feedback, @jzleibo. Could you give more insights on the LAYER observation?

jzleibo commented 2 years ago

We've mostly regarded the layers as an implementation detail, not something that people would really be using as an observation. We might add or subtract layers in the future, or they might have their order permuted. They also sometimes contain privileged information that focal agents are not supposed to be able to access.

Some of the substrates have invisible objects that could be seen via the LAYERS observation. For example, observing the LAYERS would let you see the location where an apple will later spawn before it actually spawns. So an agent trained with the LAYERS observation would have access to privileged information that RGB agents would not have.

kinalmehta commented 2 years ago

Okay. Thank you for the info. I understand that it's best to work with RGB only.

jagapiou commented 2 years ago

I think this is resolved?

kinalmehta commented 2 years ago

Just posting results with spriteSize=1 I replaced the first conv layer of (8,8) with stride 8, valid padding with a (3,3) with stride 1, same padding.

The learning and convergence seem to be almost similar, with a speed-up of about ~20%. image

jzleibo commented 2 years ago

Nice! It's possible that *_in_the_matrix would work better than the others at sprite size of 1X1. So in order to really be sure this works in general we'd want to look at some of the more visually complicated substrates. But I agree that this result is very promising, and there's a good chance it'll continue to work well in the more complicated substrates too.

It will certainly affect learning dynamics in hard-to-predict ways. But I think that's ok.

kinalmehta commented 2 years ago

Looking at the documentation and observations, I believe collaborative_cooking_* and chemistry_* would pose problems with 1x1 sprite size.

Any specific substrate you have in mind with more complex visuals?

jzleibo commented 2 years ago

I'm mainly thinking of the new substrates that will soon be released. But even with the most visually complex of those, it's not clear how much aliasing would really happen when you average the 8X8 sprites down to 1X1. It's likely that the numbers would often work out to different values on the red, green. and blue channels (assuming you are still using RGB at 1x1).

Neural nets can pick up on small differences in the values on the channels. And, there's nothing in meltingpot that would require visual invariance like changes in illumination. So it should be possible for agents to become sensitive to very small differences in pixel values if they are reward-relevant.

This would, of course, affect learning dynamics, potentially slowing it down if the averaging brings sprites closer together in representation space that would otherwise have been farther apart. Though it could also potentially speed up learning if the averaging brings the representation closer to one that can get reward quickly. It's an empirical question and we'd have to answer it separately for each substrate.

There is another way that downsampling could destroy information. Sometimes we have "compound sprites", like when an avatar carries an object. In that case, the "real" representation is 'avatar + object'. This is preserved in the high resolution image but lost in the low resolution one, which effectively just maps 'avatar' to one code and 'avatar + object' to a different code, maintaining no relationship between them. Note though, in meltingpot we typically use convolutional net stride equal to sprite size, so we lose this sort of compositional information in any event. So destruction of compositional structure is not a way that 1x1 sprites could be worse than (our) baseline algorithms. I don't think we have enough compound sprites for this to matter, though that will start to change a bit in the new suite that we will soon release.

As for collaborative_cooking_* and chemistry_*. My guess is that they would be fine at 1x1 actually.

kinalmehta commented 2 years ago

Thank you so much for the insights.