Elucidation / mapf-multiagent-robot-planning

Multi-Agent PathFinding (MAPF) for 2D Robots moving inventory on a grid - Practice building environment + robots + planning + inventory management etc.
MIT License
11 stars 3 forks source link

Store true heuristic dict locally/redis/etc. as warehouse increases in size #94

Closed Elucidation closed 1 year ago

Elucidation commented 1 year ago

true heuristic is static based on warehouse yaml, and currently gets generated every time robot allocator is spun up. This isn't necessary, could store on disk or on redis.

As the warehouse size increases so does the true heuristic dict (24x26 current warehouse ~390 Kb) -> (200x200 would be 1.6 Gb) we can't have each worker node store the true heuristic locally. Consider storing on disk or pulling from redis since it is static.

Elucidation commented 1 year ago

Also want to consider on redis the true heuristic is essentially a 4D array of (NxM)x(NxM) for N rows and M columns in the grid, resulting in a O((NxM)**2) sized array. A single generate_path has a fixed goal so only needs a 2D slice NxM for that goal position. One option is a redis dict of (goal pos) -> NxM slice true heuristic grid. This way a generate_path only has one redis query and the amount of data pulled is quite small relatively, say for a 200x200 grid just 40kb.

Elucidation commented 1 year ago

To be determined if loading from disk or from redis is faster, as worker nodes could all use a shared read-only volume

Elucidation commented 1 year ago

Scaling up warehouse to World Shape: (94, 51), 536 robots, 504 item zones, 326 stations On dev machine the system does run, node js web server can keep up (disable robot tables and truncate stations). The robot allocator can handle only a 1/4 to half of jobs in a cycle before being threshold limited, despite this the system still functions as it just means robots have to wait an extra couple cycles before the robot allocator can process them.

2023-06-19 15:02:10 2023-06-19 22:02:10,409 - robot_allocator - INFO - update end, took 156.048 ms, processed 126/536 jobs, assigned 1/1 available robots to 2 available tasks

image

Note, the heuristic takes about a minute to generate

2023-06-19 14:57:48 2023-06-19 21:57:48,430 - robot_allocator - INFO - Built true heuristic grid in 59969.09 ms

Considering the 94x51 grid, that does result in a 4794**2 ~ 23mb sized heuristic

Even with this much larger grid and more robots ie more dynamic obstacles, the generate path while slightly longer is still in the 10's of ms with a mean of 4.35ms!

image

image

Elucidation commented 1 year ago

For a 300kb dict (as is the current warehouse), file IO to an SSD is on the order of milliseconds for reads and writes using the python pickle module.

World Shape: (26, 24), 44 robots, 48 item zones, 24 stations
Building true heuristic
Built true heuristic grid in 173.29 ms
Saved true heuristic 289.1 kb  to true_heuristic_dict.pkl in 2.67 ms
Loaded dict 289.1 kb  from true_heuristic_dict.pkl in 5.90 ms

But if the program is run without writing first (assuming it's already read), it tends to get cached or similar on IO

Loaded dict 289.1 kb  from true_heuristic_dict.pkl in 7.34 ms
Loaded dict 289.1 kb  from true_heuristic_dict.pkl in 1.09 ms
Loaded dict 289.1 kb  from true_heuristic_dict.pkl in 0.93 ms
Loaded dict 289.1 kb  from true_heuristic_dict.pkl in 0.59 ms
Loaded dict 289.1 kb  from true_heuristic_dict.pkl in 0.77 ms

This is almost definitely going to be a better choice than redis with network latency. Considering it'll be write once, read many.

Elucidation commented 1 year ago

This was fixed a bit ago with pickle save/load