Closed DanielFEvans closed 4 years ago
If you can't share the data you must do the hard work yourself.
I would start by splitting the shapefile into two halves and rasterizing them. 1) Both works. Odd, but perhaps adding features from half 2 into half 1 makes the failure to happen. 2) One works, another fails. Keep on splitting, perhaps you will find some odd geometry that triggers the error. 3) Both fails. Odd, but keep on splitting. If even shapefile with just the last feature gives failure then perhaps it is not too confidential to share.
I wonder if degrees could have turned inte meters for some reason. Resolution of 0,000045 meters would make quite a big image. Or perhaps rings in some polygon are somehow wrong and the geometry seems to wrap around the world.
@jratike80 - I'll eventually get around to giving those suggestions a go. I see Even did some work to avoid integer overflows, but I suspect that might just mask another underlying problem (or, alternatively, mean my workaround no longer works, as I was fixing it by causing an overflow!).
@DanielFEvans I strongly suspect you hit the integer overflow. Reducing GDAL_CACHEMAX under 2 GB would also workaround the bug I fixed
Reducing GDAL_CACHEMAX under 2 GB would also workaround the bug I fixed
Well there you go, I shouldn't have been so dismissive - that does indeed do it, so it looks like that actually was the cause.
In fact, I can now see that we've got exactly that workaround in another piece of rasterizing-related code, with a comment references a bug with exactly the same symptoms: https://github.com/OSGeo/gdal/issues/1338
Expected behavior and actual behavior.
Expected: GDAL rasterizes a vector dataset correctly, producing an output with expected size 68547x60306 with dtype byte.
Actual: GDAL fails with
ERROR 2: gdalrasterize.cpp, 957: cannot allocate 18446744072761901056 bytes
(i.e. it attempts to allocate several thousand petabytes).Steps to reproduce the problem.
I am attempting to rasterize a shapefile at 5m resolution over a portion of the UK. This amount of data should easily be within the capabilities of GDAL. However, using the following snippet of Python code, the memory error above.
One very strange finding is that if the above is called with
outputType=gdal.GDT_Int16
, the rasterization works entirely normally, with the correct output being obtained.It is only one specific shapefile, with these specific inputs, for which I've observed the issue.
While I'm not able to share the data (the joys of proprietary data), it seems this is very specific to the combination of data and parameters, and I'm happy to diagnose further if someone has ideas.
Operating system
Scientific Linux 64 bit
GDAL version and provenance
Both 2.3.0 and 3.0.4, compiled locally. The line number in the error quoted above is from v3.0.4, line 864 in the same file is quoted for v2.3.0.