Closed danielrode closed 1 month ago
which is what GDAL is using here to store raster values
Wrench is using an int16_t
to store the counts. This could probably be bumped to an int32_t without impact, especially if the output TIFF is being compressed with a common compression encoding.
Wrench crashed again, even with the 1-meter pixels setting. It ran for over 8 hours before crashing with this message:
0/usr/include/c++/14/bits/stl_vector.h:1127: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator [with _Tp = unsigned char; _Alloc = std::allocator
; reference = unsigned char&; size_type = long unsigned int]: Assertion '__n < this->size()' failed. ./gen-density-raster.sh: line 31: 71023 Aborted (core dumped) /usr/lib64/qgis/pdal_wrench density "${args[@]}"
I am not sure if this is also a int16 vs int32 problem. I know it is not a RAM issue because my system still had over 25GB of RAM available (out of 128GB) at the time Wrench crashed.
It's very hard to know what might be going on without the full command and perhaps the input file.
Also, an assertion suggests you're running in debug mode. Can you provide information about the version or PDAL/wrench that you're using and how it was built, if you know?
Also, things can get pretty ugly if you don't provide bounds for the output. I'm not sure how wrench decides this. Without this, memory often has to be copied over and over again. This is slow. Having some idea about the scale of your input and output is very helpful to analyzing what might be going on.
I tried running pdal_wrench --version
and pdal_wrench --help
, but neither would show version information.
I am using the version of Wrench that comes bundled with QGIS 3.34:
Here's the command I ran last (in Bash) that produced the error mentioned in my last post:
args=(
--input=collection.vpc
--resolution=1
--output=./export/tiles.tif
--tile-size=100
--tile-origin-x=0
--tile-origin-y=0
)
/usr/lib64/qgis/pdal_wrench density "${args[@]}"
collection.vpc points to a LiDAR collection of about 9,000 COPC files; the collection is almost 2TB. I do not have time to sift through the collection at the moment, but if I find out which specific tile is causing trouble, I will post that here later.
For now, I am just going to work around this by running a bunch of vanilla PDAL instances in parallel, each on one .copc.laz file.
As for scale, this was the output Wrench gave when it started its run:
grid 114360x168446 tiles 1145 1686
The collection uses meters as its projection. I just tried to run pdal info
to get the collection bounds, but got another crash:
$ pdal --version pdal 2.6.3 (git-version: Release) $ pdal info --metadata collection.vpc /usr/include/c++/14/bits/basic_string.h:1269: std::cxx11::basic_string<_CharT, _Traits, _Alloc>::reference std::cxx11::basic_string<_CharT, _Traits, _Alloc>::operator [with _CharT = char; _Traits = std::char_traits
; _Alloc = std::allocator ; reference = char&; size_type = long unsigned int]: Assertion '__pos <= size()' failed.
Just eyeballing it in QGIS, the collection appears to span an area of 120 km by 170 km. The collection does not fully fill in this area though (there was no LiDAR collected for about 2/3 of the area).
114360 * 168446 = 19,263,484,560. That's 19GB. Then each cell is 4 bytes, so you're up to almost 80GB. Also, if PDAL doesn't know the size of the output buffer at start time, it will have to move data around. This means allocating for both source and destination, so you could think that maybe you need 160GB to do this. It seems likely you're running out of memory. It's a very large problem and you should probably break it into pieces.
Like I mentioned above, I had 25GB of RAM still available when Wrench crashed (I run a RAM usage monitor that polls every 2 seconds). I suppose it is possible that Wrench memory usage spiked and then crashed within that 2 second window, but I do not know enough about the internal workings of the code to know how likely that is.
Is it possible to specify the size of the output buffer at start time? If so, how?
Either way, I agree, breaking up the processing into batches is probably going to be my best path forward.
If you need a buffer of 50GB, a single allocation may fail if you only have 25GB.
I don't know how you specify the size using wrench.
After running vanilla PDAL individually on each tile, I found the problem tile that was causing the most recent error I encountered. This particular error is not a memory issue since I reproduced it again when running a single PDAL instance on just that one tile and there were no other memory intensive programs running at the time; this particular COPC LAZ tile is only 250 MB.
Command:
pdal pipeline --stream --stdin <<EOF
[
"import/tile______.copc.laz",
{
"type": "writers.gdal",
"resolution": 1,
"binmode": true,
"dimension": "X",
"data_type":"uint16_t",
"output_type": ["count"],
"gdaldriver": "GTiff",
"filename": "tmp.tif"
}
]
EOF
Error:
/usr/include/c++/14/bits/stl_vector.h:1127: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator [with _Tp = unsigned char; _Alloc = std::allocator
; reference = unsigned char&; size_type = long unsigned int]: Assertion '__n < this->size()' failed. Aborted (core dumped)
All other tiles for the collection appear to have rasterized successfully. I opened up this particular tile in QGIS to inspect it and I noticed a 25-meter by 180-meter void in the center of the tile where the point density is very very low (only a few points scattered about). This appears to be a water body that scattered most of the laser pulses and so virtually no data was recorded there.
I suspect that this particular crash is happening because GDAL does not like being passed pixels in the center of a tile that contain no points. What are some workarounds for this? And if this is the case, why doesn't GDAL complain when processing edge tiles? It would be nice if PDAL could anticipate this kind of lidar input and provide an informative error message such as "error: No points found for pixel X,Y (cannot process tile)", or simply tell GDAL to mark those pixels as "No Data" as it does for edge tiles.
I don't think there is any way to know what's going on without the data. If you would like to provide the data I will see what I can learn.
I did some more investigation and found that the original LAZ file from our vendor for that tile worked fine. I just converted that original to COPC again (with PDAL) and that version worked too. I am guessing that the earlier PDAL conversion I did to COPC awhile back must have corrupted that one tile, but in such a way that lidR did not mind, so I did not catch the corruption until now. I am going to close this issue and open a new one just for the original int 16 vs int 32 issue I ran into.
I am using wrench to create a density raster for a very high-density LiDAR collection using
pdal_wrench density
. The collection I am working with reaches densities of 200 points/meter^2 in some places. (I know this because I used lidR to create a density raster previously, but we are in the process of converting parts of our workflow to use PDAL.) I wanted to create a 20-meter raster, however, I kept getting the following error:After some troubleshooting and looking at the point clouds in QGIS, I finally realized that GDAL/PDAL was crashing once it reached the higher density areas of the collection. Assuming the worst case for a single pixel (a value of 200) and then multiplying this by 20x20 gives us 80,000. This is well above 32,767 (the maximum value of a short integer, which is what GDAL is using here to store raster values).
For now, I worked around this by creating a higher resolution raster (using 1-meter pixels instead of 20-) and will down-sample later.
It would be nice if PDAL Wrench could handle creating coarse resolution density rasters from these kinds of high-resolution LiDAR collections (perhaps by providing the option to use higher bit values for rasters), or at least provide a clearer error message when the value is too large to be saved.