OSGeo / gdal

GDAL is an open source MIT licensed translator library for raster and vector geospatial data formats.
https://gdal.org
Other
4.79k stars 2.51k forks source link

gdal_contour slow performance with GPKG output #10729

Closed jfbourdon closed 1 week ago

jfbourdon commented 1 week ago

What is the bug?

Using GeoPackage as the output format when producing contour lines with gdal_contour results in far superior processing times (over 10 times) compared to using ESRI Shapefile as the output format.

Steps to reproduce the issue

My tests were made using this large DEM

  1. Run gdal_contour -i 10 -a ELEV MNT_21M16NO.tif contour.shp and check time (about 20 seconds for me)
  2. Run gdal_contour -i 10 -a ELEV MNT_21M16NO.tif contour.gpkg and check time (about 410 seconds for me)

Both results are identical and each process consumes the same amount of RAM. However, I see far less CPU usage when the output is GPKG than with SHP.

Versions and provenance

GDAL 3.9.2 (from OSGeo4W for installing QGIS 3.38.2) Windows 10 Enterprise 22H2

Additional context

No response

jratike80 commented 1 week ago

Could it be that gdal_contour does not use big transactions? See https://gdal.org/en/latest/drivers/vector/sqlite.html#target-drivers-vector-sqlite-performance-hints.

rouault commented 1 week ago

With #10731 enhancement:

$ time gdal_contour -i 10 -a ELEV MNT_21M16NO.tif contour.gpkg
0...10...20...30...40...50...60...70...80...90...100 - done.

real    0m25,155s
user    0m24,463s
sys 0m0,644s
rouault commented 1 week ago

With #10731 enhancement

hum, slightly embarrassing, actually I get pretty much the same time with just master (on Linux). I'm not sure why you get a much slower performance experience on Windows

theroggy commented 1 week ago

hum, slightly embarrassing, actually I get pretty much the same time with just master (on Linux). I'm not sure why you get a much slower performance experience on Windows

Not directly related, but for most projects that I am familiar with (e.g. geopandas, geofileops, pyogrio, shapely), the CI tests run 2 up to 5 times longer on windows than on linux... even though the HW should be similar.

So, windows being slower for this type of software seems to be a given :-(...

jratike80 commented 1 week ago

GDAL 3.10.0dev is using 34 seconds on my Windows when I send the output to memory database :memory: and 214 seconds when it writes the output into a file on a fast SSD. Maybe Windows deals slowly with the file system?

theroggy commented 1 week ago

I/O and the caching related to it is quite differently on windows vs linux, and even more so when networked storage is involved.

Specifically for writing, if I remember correctly (from a long time ago) linux buffers I/O more aggressively and typically doesn't wait till everything is flushed/synced to disk while windows does. This could make a significant difference, even though I doubt the difference you mention could only be attributed to that (if it would be the case).

agiudiceandrea commented 1 week ago

On Windows 10, using GDAL 3.9.2 (either from OSGeo4W or conda-forge), gdal_contour takes about 4 or 5 times longer when writing the contours to a GPKG layer than to ESRI Shapefile layer on a local HDD. If OGR_SQLITE_JOURNAL is set to one of TRUNCATE | PERSIST | MEMORY | WAL | OFF, then gdal_contour takes about the same time when writing the contours in a GPKG layer or to an ESRI Shapefile layer.

jratike80 commented 1 week ago

OGR_SQLITE_JOURNAL=OFF made 214 seconds to go down to 26 seconds.

theroggy commented 1 week ago

Yes, an I/O write buffering difference is a possible explanation for this behaviour.

For each transaction that ends/is committed, sqlite will flush and sync all data to disk. As the default was to use many small transactions, every row or few rows saved to gpkg will lead to a flush and sync to disk. As far as I know, linux will aggressively buffer these syncs in RAM, so it won't wait for the actual save to disk while windows will really wait... which adds up if the are many (small) commits.

With large transactions of e.g. 100.000 rows (the change roualt made), the flush+sync will only be done a few times, so the different treatment of linux vs windows becomes irrelevant. OGR_SQLITE_JOURNAL=OFF ~disables transactions completely, so leads to a similar effect.