Retera / WarsmashModEngine

An emulation engine to improve Warcraft III modding
GNU Affero General Public License v3.0
192 stars 37 forks source link

Save memory usage by unifying node arrays of all CPathfindingProcessors. #31

Closed bearU369 closed 3 months ago

bearU369 commented 1 year ago

Problem: Memory usage in-game has been bothering me for months as I have a PC with only 8 GB RAM in it. Warsmash is a memory-demanding game as it would crash from OutOfMemoryError. Because of it, I usually play 1-2 map before I restart the whole application to play another map. Sometimes I consider myself lucky when I manage to play 3 and more maps before restarting the whole application.

The inner-workings of the pathfinding nodes can be problematic as it scales based on the map's size and the number of players adjusted in the INI file as it creates multiple node arrays per player (including Neutrals). Some maps were completely unplayable in my PC as I have to tune down the MaxPlayers to play these maps.

Solution: Because each CPathfindingProcessors initialize their own node arrays, it causes the program to be a memory-hog. My solution is that instead of CPathfindingProcessors initializing the arrays, it would take in the reference of the "master" node arrays from CSimulation. CSimulation's node arrays would be passed down to the rest of the CPathfindingProcessors as reference variables to minimize the memory usage. In addition, Warsmash would able to scale up number of players without increasing the amount of memory used by CPathfindingProcessors.

From my benchmark, I managed to reduce the memory usage to an estimated range between 36.4% to 41.3% of the original memory usage before this commit. Here's the benchmark I done by playing each map for 5 minutes:

Map 12+Neutrals BEFORE 24+Neutrals BEFORE 12+Neutrals AFTER 24+Neutrals AFTER
(4)Lost Temple 2200.0-2300.0 MB 2400.0-2500.0 MB 800.0-950.0 MB 800.0-950.0 MB
(12)Divide and Conquer 2400.0-2500.0 MB N/A (OutOfMemoryError) 1000.0-1100.0 MB 1000.0-1100.0 MB

Numbers provided are the estimated minimum and maximum memory usage gathered from Task Manager during my gameplays.

Retera commented 1 year ago

Maybe this is where it would be nice if we had unit tests to catch possible contrived edge case bugs, but I would be concerned that this change might introduce a bug (that might not be immediately evident) in large games with multiple players. The pathfinding nodes are stateful -- that is, to finish a particular pathfinding job where Unit A asks to find how to reach Point B, over the course of determining this information the pathfinding processor stores data such as "cameFromDirection" in each node it visits, so that at the end of its search it can backtrack back to the Unit A from Point B and construct the pathway necessary to reach the Point B that was a result of its A* search evaluation.

Although you may find in the git history that there was an earlier version of this solution that computed the search all in one step, regardless of how large the map was, as time went on during the development of this code I came to understand that even on a fairly modern desktop PC the time to process the search all in one step with the current algorithm was too long. This led to CPathfindingProcessor.update(...) which is callback based and might only complete 50% of a search at a time, and leave the grid in an incomplete state waiting for it to get back to work next time the system calls CPathfindingProcessor.update(...) again.

By sharing the search graph between all players, although in the git history you may be able to find a primordial version of Warsmash where it was also that way and had only one CPathfindingProcessor class instance for the entire CSimulation, unfortunately with the current algorithm where CPathfindingProcessor.update(...) may only do partial work I believe that sharing the work state between processors may result in their results clobbering each other in more complex cases with lots of players and lots of units all moving at the same time who reach the throttle frequently and are only doing partial searches per tick of game logic.

It would be really smart if I could create a unit test of what I am saying, or something like that, but in the meantime if you wanted to test this you could try a big multiplayer game and make sure to have the units of many players moving simultaneously. The bug would be identifiable by a unit seemingly running at walls or going in incorrect directions, or perhaps in the worst case hitting the CPathfindingProcessor exception around line 369, new IllegalStateException("PATHING SYSTEM ERROR: The path finding algorithm hit an infinite cycle at or near pt: " which in its current form would crash the game and lose player progress.

bearU369 commented 1 year ago

By sharing the search graph between all players, although in the git history you may be able to find a primordial version of Warsmash where it was also that way and had only one CPathfindingProcessor class instance for the entire CSimulation, unfortunately with the current algorithm where CPathfindingProcessor.update(...) may only do partial work I believe that sharing the work state between processors may result in their results clobbering each other in more complex cases with lots of players and lots of units all moving at the same time who reach the throttle frequently and are only doing partial searches per tick of game logic.

I think the solution to this is by duplicating the searched Nodes temporarily for where CPathfindingProcessor would assign the changes and traverse the nodes without affecting the whole search graph shared to other players. After finishing the task, it would flush the temporary nodes out. A hacky one but possible.

if you wanted to test this you could try a big multiplayer game and make sure to have the units of many players moving simultaneously. The bug would be identifiable by a unit seemingly running at walls or going in incorrect directions, or perhaps in the worst case hitting the CPathfindingProcessor exception around line 369, new IllegalStateException("PATHING SYSTEM ERROR: The path finding algorithm hit an infinite cycle at or near pt: " which in its current form would crash the game and lose player progress.

Difficult for me to debug but I'll try, I hope Unit Order APIs work in Warsmash.

Retera commented 1 year ago

This might be difficult to prove, but if I understand correctly the notion of using a hash map to map per-player data for a given node, in the worst case if all players were pathfinding for all their units simultaneously and the units were searching all of the entire map, wouldn't the worst case memory consumption be basically the same or similar to what we are seeing prior to your suggested changes?

So, if I have a N×M map file being played by P players, then prior to your change Warsmash was basically allocating O(N×M×P) memory, I would imagine. If I understand correctly, would your new solution cause the average working memory of the gameplay to be lower than this "probably" but the worst case memory allocation would fall back to still being O(N×M×P) in some cases?

My thought process and why I have delayed fixing this kind of problem in the pathfinding is that industry standard game engines today probably do not use an N×M search graph for pathfinding, but instead process the world into a simpler graph before searching. I got the impression that a lot of academia research has been done into solving this kind of problem, and I did not extensively research the academia behind it before implementing my solution. Instead, I literally copied the A* search off of wikipedia and then just ran the game engine and looked at how it felt and kept tinkering with it for days until it "felt" more like how I wanted. So, as a result, I have code that looks "kind of like Warcraft III" in my experience, but that is inefficient algorithmically compared to what it is actually supposed to be for the ideal case.

This probably means that the ideal solution to this part of my code is to blow it all away and replace it with something better. But I have been putting that off, because I was left feeling like it would be relatively large effort for relatively little gain from the standpoint of one who desires to play this game engine system presently.

If you want to look up concepts like "Navmesh" and the public domain of knowledge regarding pathfinding, you might find stuff that is dramatically better than Warsmash's current algorithm at memory consumption and performance both. Assuming that's true, the challenge then would be trying to fit the better solution to the edge cases that we expect in the custom maps, such as how some spaces are too small for a Paladin to fit through but large enough for a Footman to fit through, and we want their pathfinding search to understand this difference.

But these are just some ideas. I did not extensively read your updated pull request code contents yet, other than to determine that it was mapping the existing nodes to what looked to be per-player nodes, which again sounds rather like a bandaid solution to a problem that might be better solved by discarding the code entirely and replacing it all with some other industry standard. Or maybe it could be replaced by the use of some existing library. Warsmash is already building against LibGDX to handle project setup, and in theory LibGDX has some physics engine that they promote and say is easy to include in a LibGDX game system like Warsmash. And I have not really used their suggested physics engine; it might be really easy or support really easy-to-use and efficient pathfinding. It may be worth the time to investigate, and I have not yet personally put in that time.

bearU369 commented 1 year ago

This might be difficult to prove, but if I understand correctly the notion of using a hash map to map per-player data for a given node, in the worst case if all players were pathfinding for all their units simultaneously and the units were searching all of the entire map, wouldn't the worst case memory consumption be basically the same or similar to what we are seeing prior to your suggested changes?

So, if I have a N×M map file being played by P players, then prior to your change Warsmash was basically allocating O(N×M×P) memory, I would imagine. If I understand correctly, would your new solution cause the average working memory of the gameplay to be lower than this "probably" but the worst case memory allocation would fall back to still being O(N×M×P) in some cases?

Yeah something like that. The issue with each processor having their own node arrays is a lot of unused memory declared considering most players are unused when playing in singleplayer or with few players. Mapping only segments of the map is probably a good way to maintain memory footprint without mapping the entire map since those nodes wouldn't probably be reached by A* depending on the given path.

It's possible that it will reach O(N×M×P) in worse case scenario, but that memory allocation would probably be flushed out after all players are done with pathfindings, compared to being stuck with O(N×M×P)

My thought process and why I have delayed fixing this kind of problem in the pathfinding is that industry standard game engines today probably do not use an N×M search graph for pathfinding, but instead process the world into a simpler graph before searching. I got the impression that a lot of academia research has been done into solving this kind of problem, and I did not extensively research the academia behind it before implementing my solution. Instead, I literally copied the A* search off of wikipedia and then just ran the game engine and looked at how it felt and kept tinkering with it for days until it "felt" more like how I wanted. So, as a result, I have code that looks "kind of like Warcraft III" in my experience, but that is inefficient algorithmically compared to what it is actually supposed to be for the ideal case.

This probably means that the ideal solution to this part of my code is to blow it all away and replace it with something better. But I have been putting that off, because I was left feeling like it would be relatively large effort for relatively little gain from the standpoint of one who desires to play this game engine system presently.

I'd been skimming around the web with pathfinding stuff previously like HPA and quadtrees. Though I have no idea how to implement either of them nor implement them in a dynamic environment.. I heard that HPA is something that people in OpenRA chose with for their pathfinding system.

Maybe this could be your interest for optimal implementations: https://www.redblobgames.com/pathfinding/grids/algorithms.html

linsmod commented 11 months ago

the CPathfindingProcessor is just fine, only the problem is, it always ran to the OOM on my android phone. after some research, a solution is come out. making a lazy initialization to cornerNodes and nodes in CPathfindingProcessor . because the processers array size is player count but the map players not always full fill the MAX_PLAYER. only initializing the cornerNodes and nodes when a PathfindingJob is comes out from moveQueue is enough. this will solve the OOM problem if only few players there.

its not a complete solution but it makes you far away from the OOM on small map.