corl-team / xland-minigrid

JAX-accelerated Meta-Reinforcement Learning Environments Inspired by XLand and MiniGrid 🏎️
Apache License 2.0
159 stars 12 forks source link

[Feature Request] Add Maze-like env #2

Open carlosluis opened 7 months ago

carlosluis commented 7 months ago

Hi!

Awesome job on the repo!

Feel free to ignore this request if it's not part of your roadmap. It's more of a suggestion to have other type of exploration tasks.

There's partial code on https://github.com/Farama-Foundation/Minigrid/pull/317 to generate feasible mazes (with a unique direct path to the goal, I believe) based off mini-grid envs. Taking that code I was able to generate envs such as these:

I thought it might do for an interesting meta-RL exploration benchmark, i.e., can your algorithm learn to exhaustively explore the maze until it finds the goal? In principle it might not be that much different than exploring in an open-space grid, but who knows! Maybe the more constrained state-space might even accelerate (or slow down) training progress.

Cheers!

Howuhh commented 7 months ago

Hi @carlosluis! This is actually a very important suggestion and we plan to add procedural generation in some form sooner or later anyway. However, in our experience (and this is actually one of the reasons why this is still not there) procedural map generation is quite difficult to represent in an efficient and jit-compatible way (like recursive maze generation algos). There are some successful examples tho, for example in Jumanji or in minimax.

We're unfortunately unlikely to be doing this anytime soon (it's in the plans for post v1.0, ~2-3 months), as we're currently busy working on getting XLand-MiniGrid to full paper and focused on meta-RL part (benchmarks), but we welcome any contributions, as grid randomization will definitely add new challenges to the meta-learning, as well as would allow to port procedural multi-room envs from original MiniGrid. Thus, its highly valuable addition.

P.S. Maze exploration alone is not a meta-RL problem I think, since a new maze can be solved zero-shot without the need for adaptation, only generalization (like ProcGen).

alexunderch commented 7 months ago

Maybe, it is worth trying to add any simple procedural generation algorithm to test the concept, maybe it would be not that hard. Jax could be paired with recursive algorithms (for tree-search, for example), and some simple example could be a way to start.

Sounds promising 🤗

Howuhh commented 7 months ago

There's another problem at the moment. The agent can see through walls 🥲! Unfortunately, the naive porting of the FOV algorithm from MiniGrid slows things down too much (although it is available in the current version, but disabled). We haven't come up with a replacement for it yet, although we've tried different things (like simple ray casting). Without it I think maze will be easy enough to solve. We are open to any suggestions/help on this! For now we just reduce FOV size in most cases to make it a bit harder.

carlosluis commented 7 months ago

Thank you all for having a look at this so quickly!

I understand the challenges of jitting procedural generation algos, but why not start simple and take maze-generation outside of the jitting? Basically pre-generate a bunch of mazes on initialization and then sample from this list whenever the meta-RL algorithm asks for a new task? Maybe I'm being naive here and missing a key detail of why this wouldn't work.

@Howuhh re: why maze exploration may or may not be a good benchmark for meta-RL I agree with you that normally this is a test for generalization rather than adaptation, but at the same time the line between generalization and adaptation can be quite fuzzy. You are right that a new maze requires no adaptation, but it requires to follow a "good" exploration strategy, i.e., exhaustively search for the goal location. Once the goal is found, you have completely identified the task and from there on the agent should find the shortest path towards the now known goal location. Then this benchmark would test the capability of meta-RL algorithms to learn this exploration behavior during meta-training under very sparse rewards, i.e., you meta-learn the generalization. From this perspective, I do see value on testing meta-RL in such benchmarks!

Happy to hear your thoughts and arguments here though, I think it's an interesting discussion without a clear right/wrong answer.

Howuhh commented 7 months ago

Maybe I'm being naive here and missing a key detail of why this wouldn't work.

There are actually two reasons why I didn't already done this: the first is the inconvenience of having to store and download the maps separately in addition to the benchmarks, and the second is that a million maps in unit8 can start to take up a lot of memory on the GPU (height x width x 2 x 8bits x 1M ~ at least 0.5GB). This is actually quite a lot, as GPU memory is highly valuable. We can store them on CPU tho, but additional FPS benchmarks is needed for this case, maybe overhead is low..

But it's probably the only way. I'll see if I can get it done in time besides the main roadmap.

carlosluis commented 7 months ago

Maybe I'm being naive here and missing a key detail of why this wouldn't work.

There are actually two reasons why I didn't already done this: the first is the inconvenience of having to store and download the maps separately in addition to the benchmarks, and the second is that a million maps in unit8 can start to take up a lot of memory on the GPU (height x width x 2 x 8bits x 1M ~ at least 0.5GB). This is actually quite a lot, as GPU memory is highly valuable. We can store them on CPU tho, but additional FPS benchmarks is needed for this case, maybe overhead is low..

But it's probably the only way. I'll see if I can get it done in time besides the main roadmap.

I see, that makes it inconvenient, I agree! Also an appropriate sample size would depend on the size of the maze. Maybe 1M maps is overkill for 10x10 mazes, but insufficient for 100x100 mazes. Hard to tell a priori what would be a good value. Although I believe you can get a lot of signal regarding the effectiveness of meta-RL exploration with relatively small mazes