amidos2006 / gym-pcgrl

A package for "Procedural Content Generation via Reinforcement Learning" OpenAI Gym interface.
MIT License
113 stars 27 forks source link

Pathfinding bug (binary domain / calc_longest_path) #12

Open smearle opened 2 years ago

smearle commented 2 years ago

The following code returns a sub-optimal path:

import gym
import numpy as np

import gym_pcgrl
from gym_pcgrl.envs.helper import get_string_map

maze = np.array([[0, 0],
                 [0, 0],
                 [0, 1],])

env_name = "binary-narrow-v0"
env = gym.make(env_name)
stats = env.unwrapped._prob.get_stats(
    get_string_map(maze, env.unwrapped._prob.get_tile_types()),
)
print(stats)

This prints {'regions': 1, 'path-length': 2}, but the maximum path length is 3 (from bottom left to top right).

In calc_longest_path, the empty tile (0, 0) is chosen as the source of a call to run_dijkstra. This returns the following dijkstra map: 0 1
1 2
2 -1

. The extremities relative to this tile are tiles (1, 1) and (2, 0), which are both 2 away from (0, 0). But (1, 1) is chosen as the source of the next call to dijkstra, which results in:

2 1
1 0
2 -1

, giving a path-length of 2.

If instead (2, 0) was chosen as the source of the second call to dijkstra, we would have:

2 3
1 2
0 -1

, resulting in a path-length of 3.

One obvious fix would be to call dijkstra from each of the extremities that results from the first search, then record the maximum path-length over these. Is there any faster way?

amidos2006 commented 2 years ago

Yup, that issue was intentional since the beginning of the PCGRL and it is well known. Getting the longest shortest path is an np problem and to solve it accurately you need to do dikjstra n^2 times which is extremely slow leading to O(n^4).

The way we calculate is by running dijkstra twice leading to a good approximation with O(n^2). The approximation is consistent and returns good relative values which were what is important for the framework not to get accurate results but to get consistent results that are relatively good (when you compare two maps, I can approximately tell you which one is better).

A higher accuracy solution is what you have suggested. The second run of dijkstra is to be repeated k times where k maximum will be probably n making the algorithm O(n^3) in the worst case. The k times equal to all the cells with the largest value after running the first dijkstra.

In the end, the function was not designed to return the perfect result but to return fast results for RL so O(n^2) is better than O(n^3) in the case of RL and the difference won't be as noticeable from RL perspective to learn because it usually differs in a small amount like the 3 and 2 example that you have shown.

smearle commented 2 years ago

Thanks so much for the detailed explanation! I missed the memo, clearly, but nonetheless an interesting problem to think about. The current approximation sounds like a reasonable tradeoff.

Perhaps we should say "Calculate the approximate longest path on the map" for the sake of future naive coders like myself 🥲 .

amidos2006 commented 2 years ago

Sorry about all of that, It is my fault not to comment that :) Let's add that comment :)