Handle for 2D/3D observations that are images

belerico commented 8 months ago

I was also about to open an issue regarding Dreamer on feature-vector based (partially observable) environments where no Cnn is needed (and as a matter of fact, to also handle gridworlds where the observation space is also with matrices, except non-RGB, for example with 1's where the agents are and 0's otherwise).

What would be necessary to adapt from the code? I would be willing to help you guys!

Originally posted by @jmribeiro in https://github.com/Eclectic-Sheep/sheeprl/issues/229#issuecomment-1991783930

belerico commented 8 months ago

From @jmribeiro in #229:

Hi @belerico

Here it goes:

Which kind of observations do we want to support?

The observations are custom made for an environment called "Level-Based Foraging"

An agent has a fov-window of size 5x5 centered around himself. Each channel contains specific information regarding objects in its surroundings. Shape: 5 channels x 5 width x 5 height Example for agent #0

How can we specify that some 2D/3D observations has to be treated not as images but as vector-based observations?

I believe the only issues in the code I could find are handcoded properties such as assuming all are RGB arrays (and dividing sometimes by 255) and fixed padding/strides on some classes.

Right now I'm learning this environment with a DQN as following (an extra channel due to an extra teammate on the environment).

What happens to observations larger than 3D?

This should not happen, at least in my use case.

Which kind of models should be employed to handle those observations?

Conv2D are enough.

jmribeiro commented 7 months ago

Hi! Were you able to look into the issue @belerico ?

belerico commented 7 months ago

Hi @jmribeiro, sorry but we have focused our self-attention into fixing bugs and improving the overall usability of both the Dreamer-V3 agent and the library itself. We will have a look at your issue asap. Thank you for your patience

belerico commented 7 months ago

Hi @jmribeiro, one solution that I thought is the following:

Every agent accepts the encoder (and the decoder when needed) from "outside", with the user selecting and customizing it from the command line
We should add new configs, one for every encoder that we use
We should generalize the models that we are using in a way that let us customize them at hand and easily. Ecery model should have a standard API:
- Taking in input a Dict[str, torch.Tensor] and output a torch.Tensor representing the extracted features
- It should define how to combine multiple inputs
- It should define how to pre-process the input (this eliminates the need to pre-process in the agent-interaction code)
- It should define useful properties, like output_dim, to be used and fetched by other models working with them
We should modify the make_env to handle vector observations that have more than 1 dimension

Tell me if I'm missing something.

cc @michele-milesi @DavideTr8

jmribeiro commented 7 months ago

@belerico, that seems perfect. Thank you.

DavideTr8 commented 6 months ago

Hi, I'm also dealing with this issue in the branch feature/tsp_env.

If you want to check there, for now I created the field other_keys in the config and the obs with those names are not filtered nor normalized, and you will have to build your own agent dealing with them. It's just a temporary change but you can use it as a reference.

I will start working on this issue in a more organized way soon.

Eclectic-Sheep / sheeprl

Handle for 2D/3D observations that are images #237