awjuliani / neuro-nav

A library for neuroscience-inspired navigation and decision making research.
Other
200 stars 17 forks source link

POMDP support/hacking #33

Closed MHRosenberg closed 1 year ago

MHRosenberg commented 1 year ago

Hi there, great repo! What's the easiest way to adapt your code to support something fully POMDP, i.e. imagine a mouse in the dark with only whisker info. It seems like the closest agent observation types to this are the "window" and the "boundary" observations. For "window" I would like to shrink the window to 3x3, i.e. adjacent to the agent only. For boundary, I would like the cardinal rays to only extend one square/cell away from the agent. Presumably/hopefully, these implementations would give identical results. Would it be possible for you to implement this or to provide some instructions? Both approaches appear to require hacking the openai gym spaces.Box code at the minimum which seems painful.

awjuliani commented 1 year ago

Hi @MHRosenberg,

Thanks for your interest in neuro-nav, and for making this feature request. I spent a little time this morning adding two additional observation types to GridEnv which each use the 3x3 window you are asking for. These are symbolic_window_tight and window_tight. You can see the changes at the PR here: https://github.com/awjuliani/neuro-nav/pull/34.

I will wait before merging the PR into main to give you a chance to take a look and make sure this is what you had in mind.

best, Arthur

MHRosenberg commented 1 year ago

Thanks, Arthur!

That looks great from what I understand. I don't fully understand how of all of the underlying "boxes" machinery works under the hood. I also don't fully understand the difference between the symbolic and nonsymbolic options.

I attempted to clone the dev-tight branch and retry your supplied colab notebook but hit the following error:

Cloning into 'dev-tight'... fatal: repository 'https://github.com/awjuliani/neuro-nav/tree/dev-tight/' not found ERROR: Invalid requirement: './neuro-nav[experiments_remote]' Hint: It looks like a path. File './neuro-nav[experiments_remote]' does not exist.

awjuliani commented 1 year ago

To clone the branch, you will want to use the following command: !git clone --branch dev-tight https://github.com/awjuliani/neuro-nav

Here is what the window_tight observation type looks like.

3225594a-08e5-45c5-9a3c-0d0f29fb37a4

The symbolic_window_tight is a 3D tensor of shape [3 X 3 X 5], where each channel corresponds to an object in the environment. So the agent channel would look like:

0,0,0
0,1,0
0,0,0
MHRosenberg commented 1 year ago

Thanks, Arthur!

Apologies for the naive github clone question. I can clone in the colab notebook fine now, thx!

Are you able to run your colab notebook when you swap out the observation types? I tried both the provided TDSR and SARSA and got the following error:

IndexError Traceback (most recent call last) in 24 total_steps = [] 25 for i in range(num_episodes): ---> 26 agent, steps, returns = run_episode(env, agent, max_steps=num_steps) 27 total_steps.append(steps) 28 plt.plot(total_steps)

1 frames /usr/local/lib/python3.8/dist-packages/neuronav/agents/td_agents.py in sample_action(self, state) 385 386 def sample_action(self, state): --> 387 Qs = self.Q[:, state] 388 return self.base_sample_action(Qs) 389

IndexError: index 210 is out of bounds for axis 1 with size 121

Apologies for the naive questions but I still don't understand what the symbolic representation entails. Does the non-symbolic type only consider walls and the symbolic type has a channel for the 5 object types available?

I'm also fuzzy on how the state is represented when you have a limited window. Is that a tabular or function approximation approach? How are observations converted into states? Does the window version replace a n cells in the environment x n actions matrix with an 8 or 9 x n actions matrix?

I would be happy to discuss details via zoom or something if you'd prefer. This might be more efficient.

awjuliani commented 1 year ago

This is expected behavior (though perhaps a better error could be provided). All of the included algorithms in neuro-nav only work on fully observable tabular state spaces right now. These are the index and onehot state spaces.

The additional state spaces are currently provided for users who would like to use their own function-approximation algorithms as part of their research. I am planning to add some of these algorithms to the toolkit at some point in the future, but currently they are not included.

awjuliani commented 1 year ago

Hello. Since it has been a month since my last comment, I am going to assume that this issue is addressed, and will be closing it for now. If you would still like to discuss it, please feel free to reopen.

awjuliani commented 1 year ago

Hi @MHRosenberg,

Just wanted to let you know that the latest version of neuro-nav (2.0) now includes two deep reinforcement learning algorithms (PPO and SAC). These both support all of the observation spaces, including the window and window-tight spaces. You can take a look at the notebook here to see how to use them: https://github.com/awjuliani/neuro-nav/blob/main/notebooks/deeprl_tutorial.ipynb.

best, Arthur