Skretzo / shortest-path

Pathfinding for Old School RuneScape
BSD 2-Clause "Simplified" License
14 stars 27 forks source link

Remove LoadingCache and replace with array #42

Closed jocopa3 closed 1 year ago

jocopa3 commented 1 year ago

Part 2 of performance changes. Last performance PR for now; this brings performance to near real-time levels and any further optimizations I've found either degrade readability or may alter path finding.

I noticed the previous LoadingCache's max size was more than the uncompressed collision map data. It appears the max size was set so high due to empty FlagMaps with empty full-size bitsets which kept triggering the weigh constraint to load/unload old regions in the cache. This operation is expensive so I decided to look into if it was even necessary at all.

Changes made:

The total cost of the map now becomes uncompressed collision data + FlagMap member variables + 8 bytes per reference which comes out to... 2,903,256 bytes (2.8MB). The previous maximum cache size in bytes was 20,971,520 (20MB) + all of the compressed collision map data. This change actually ends up storing less long-term data overall.

After this, the only other major optimization that I can find is packing all WorldPoint's into ints in the hotpath but that comes at the expense of readability. I've already demo'd that change and found it reduced total allocations down to ~282MB with avoid wilderness off.

Hopefully after these performance changes, adding more transports/teleports or other features should be more feasible 😄

Results

Worst-Case: GE to top-left corner of the map in the ocean Short-Case: GE to a point west 100 tiles away

Metric Before (f596d1d) After (bef718d) After (Avoid Wilderness) Difference
Worst-Case Time 1213.4845ms 174.7856ms 151.2286ms 1038.6989ms -
1062.2559ms
Short-Case Time 28.5309ms 2.5727ms 2.1449ms 25.9582ms -
26.386ms
Allocations 1,286,051,528 bytes 503,251,296 bytes 728,338,632 bytes 782,800,232 bytes -
557,712,896 bytes

Before Optimization

Note: Didn't measure with avoid wilderness on as the difference is not noticeable over noise. Difference in allocation size are the same with/without this PR so see the After results below.

Worst-Case Time: 1213.4845ms Short-Case Time: 28.5309ms

CPU Samples:

image

Memory Allocations:

image

After Optimization

Worst-Case Time: 174.7856ms Short-Case Time: 2.5727ms

Additionally, now that path finding is so quick, there is a clear difference between having avoid wilderness on/off: Worst-Case Time (Avoid Wilderness): 151.2286ms Short-Case Time (Avoid Wilderness): 2.1449ms

Note: for CPU Samples, with/without avoid wilderness didn't make a difference in the call tree as it doesn't make enough samples CPU Samples:

image

Memory Allocations:

image

Memory Allocations (Avoid Wilderness):

image
Skretzo commented 1 year ago

I have tested this and it works perfectly. I am also seeing comparable performance improvements.

Before I merge this I want to ask for your opinion on the following suggestion I have to maybe improve the flag maps:

Currently, the flags in each flag map always have a length of width * height * PLANE_COUNT * flagCount = 64 * 64 * 4 * 2. Most of the upper planes are blocked/unreachable, so maybe turning the PLANE_COUNT = 4 into a variable can be worthwhile? Of the current total of 1416 regions it is possible to set planeCount = 1 for 567 regions, planeCount = 2 for 931 regions and planeCount = 3 for 1186 regions. This can probably be determined in the SplitFlagMap constructor. It will require both flags to be blocked, and some custom logic in the index and get methods. The image below shows Lumbridge (region 50, 50) which is one of the regions where all 4 planes need to be included (yellow is unblocked, purple is blocked). For Lumbridge it would be theoretically possible to discard z = 4 because that is an unblocked roof which is unreachable, but that would require a more careful approach involving crowdsourced player movement data from the OSRS Wiki.

image

jocopa3 commented 1 year ago

Removing empty planes might take the size down a bit more, but uncompressed it's already about the size of a high-res PNG file and compressed is even smaller. I personally wouldn't consider it unless the size was more of an issue (i.e. >10MB).

That said, an easy way of doing it that doesn't require much additional work on either the read or write side is to encode the number of planes in a flagmap using the size of the map; i.e. each plane adds the same amount of data to the map so dividing by the width/height/flag count would give the plane count. Edit: Noticed that the region maps already store some info in the header, so you could add the plane count there way too rather than use the size trick.

Writing could use the same optimization I had for visited tiles; each row is a long and if all rows in a plane are 0 (e.x. allRows |= row; allRows == 0) then the plane is empty and can be skipped. One thing to watch out for are any cases where plane 0 and 2 aren't empty but plane 1 is empty; in that situation I'd just write the empty plane to keep the flagmap code simple.

Edge-cases like Lumbridge probably aren't worth optimizing as the potential of having wrong/out-of-date info about unused planes is worse for the user-experience than the 1,024 bytes it saves.

jocopa3 commented 1 year ago

One thing I will suggest is you don't need to compress the collision maps if they're going to be zipped up anyway; write them uncompressed then zip the uncompressed files. Makes the file size a bit smaller and more importantly saves the cost of uncompressing each file individually at startup.

image
Skretzo commented 1 year ago

write them uncompressed then zip the uncompressed files. Makes the file size a bit smaller

Interesting. So both zipping and using GZip is bad. I just continued doing what the original author Runemoro was doing. Would you recommend run-length encoding before zipping? I know that has been used successfully in video encoding. I guess we could also remove the prefixed 16 bytes containing minX, minY, maxX and maxY as they can be easily calculated based on the file names (e.g. 50_50) and hardcoding width and height.

jocopa3 commented 1 year ago

I wouldn't recommend any compression before zipping, the zip compression will be more clever than most pre-optimizations you could make. That's why the zipped uncompressed files ends up being smaller than the zipped compressed files; compression works better on uncompressed data. It also starts getting into micro-optimization territory when there are much bigger things that could be tackled instead (like the WorldPoint packing).

The prefixed 16bytes are also fine, I didn't notice them before and assumed the size was already calculated based on file name. I'd keep them since it lets you add additional info (like plane count) to the file pretty easily.

Skretzo commented 1 year ago

I agree that this is micro-optimizations, but I know that the RuneLite devs want to keep hub plugins as small as possible to reduce the bandwidth costs (they have a hard 10 MB hub plugin size limit).

I will go ahead and merge this PR, so thank you very much for your contributions to improve the plugin and its performance! Feel free to also make a PR for the WorldPoint rewrite, as I don't mind the reduced readability too much. I will probably do a release very soon either way. It usually takes about 1 week to be reviewed before it is live in the client.