MestreLion / mcworldlib

Yet another library to manipulate Minecraft data, inspired by pymclevel
GNU General Public License v3.0
22 stars 3 forks source link

Memory grows out of hand #15

Open NielsPichon opened 1 month ago

NielsPichon commented 1 month ago

Issue

As of now, the region files are lazy loaded, but they never actually get offloaded, which may lead to Out Of Memory exceptions on smaller machines or for larger maps.

Not sure whether this affects many people but I thought it'd be a nice to have.

Expected behaviour

Regions are offloaded after they are queried, or maybe after N regions are loaded, the N-th + 1 pushed the oldest one out of memory.

implementation ideas

Instead of using a dict for storing dimensions' Regions, there could be a class which inherits from Dict but which __getitem__ is overridden and returns an instance of a realized RegionFile without storing it, or something like that, essentially always keeping the LazyLoadFileMap version of the region in the dict rather than the RegionFile itself.

To reproduce

import psutil
import mcworldlib as mc

world = mc.load(level_directory_path)
regions = world.regions[mc.OVERWORLD]

process = psutil.Process()

print(f'start memory: {process.memory_info().rss / 1024 / 1024}MB') 
for region_coords in regions:
     region = regions[region_coords]
     print(process.memory_info().rss / 1024 / 1024, 'MB')

You should see the memory usage grow. like so:

start memory: 70.93359375MB
89.87109375 MB
111.43359375 MB
172.55859375 MB
233.49609375 MB
294.43359375 MB
355.55859375 MB
375.24609375 MB
436.37109375 MB
497.30859375 MB
519.05859375 MB
527.30859375 MB
588.43359375 MB
649.37109375 MB
669.05859375 MB
...
MestreLion commented 1 month ago

This issue has been discussed in #14 , and the proposal was a compromise between the current behavior of keeping everything forever and your suggestion of never storing the parsed RegionFile object: allow the client to choose what and when to keep regions by adding a cache boolean argument when loading and an .uncache() method to (manually) unload regions.

To make this approach work with your example, you could add regions.uncache(region_coords) (or region.uncache() if it's feasible to implement that) in the end of the loop.

Additionally we could implement a cache attribute to Regions, if set to False prior to the loop the direct dictionary access in it would not store the loaded region object in the dictionary, as you requested.

In any case, I suggest we move the discussion of how to better implement this is in #14, as I've already outline a few approaches and challenges

NielsPichon commented 1 month ago

Absolutely. This would work great as a solution. Thanks 👍