bcgov / designatedlands

Python script to combine conservation related spatial data from many sources to create a single 'Designated Lands' layer for British Columbia
Apache License 2.0
9 stars 4 forks source link

Note memory requirements in README #93

Closed chrispmad closed 1 year ago

chrispmad commented 1 year ago

I followed the instructions outlined in the Readme file and the vector geopackage was successfully produced. The raster output, however, consists of 43 TIFs with names like 'dl_0', 'dl_1', etc., rather than the 4 raster files described in the raster section of the Readme instructions. Is there some additional step I need to run with docker and python to combine these 43 TIFs into the 4 intended raster output files?

smnorris commented 1 year ago

Was there an error in the script run?

The python designatedlands.py process-raster command creates the 43 individual designation rasters, then does the overlay to produce the four outputs: https://github.com/bcgov/designatedlands/blob/master/designatedlands.py#L1178 If it completed without issue, the four output rasters should be present.

Did the script log these messages, noting that the overlays were being run?

Overlaying rasters
- initializing output arrays
chrispmad commented 1 year ago

Thanks for your assistance, Simon.

I've just rerun 'python designatedslands.py process-raster'. After producing the 43 individual rasters, I get the following output: Overlaying rasters -initializing output arrays -loading process_order n42

Then I see many lines of traceback that look like this: "File "C:..., line 829, in call", and finally this message: numpy.core._exceptions._ArrayMemoryError: Unable to allocate 20.4 GiB for an array with shape (1, 136820, 159740) and data type uint8.

Is there some way to increase memory allocation for this step? Did I perhaps miss a step that increases docker's memory allocation limit?

Thank you!

smnorris commented 1 year ago

Ok, good. There is no smart chunking of the rasters, all raster processing is done by brute force load of the entire raster into memory. With the default 10m resolution, I think processing is successful with 32G of memory, and it definitely works with 64G.

If your machine has >= 32G, I think updating the Docker memory allocation can be done in the desktop GUI.

If your machine has <32G, you can avoid the issue entirely and use a coarser raster resolution (specify a config file when running the command, and set resolution=25 or similar in the config file)

chrispmad commented 1 year ago

This machine has ~32G of memory, so I will try a 25m resolution as you suggest. I really appreciate your quick responses, thank you!

smnorris commented 1 year ago

👍 I think if you are able to bump up the Docker memory allocation it should work at 32G but I haven't tried myself. It would be worth at least noting this step in the README next time this project gets some attention.