Closed rouault closed 6 days ago
I guess GDALCopyWords() is a well-known hotspot, so that has its own microbenchmark. Beyond these known hotspots, is there any documentation on how to profile GDAL?
Beyond these known hotspots, is there any documentation on how to profile GDAL?
nothing specific. The usual tools you would use to profile any C/C++ software: gprof, sysprof, valgrind --tool=cachegrind, Intel VTune (proprietary), etc. My favorite one is more modest: find a processing that is at least one minute long, run it under gdb, and regularly interrupt with ctrl+c and display the stack trace. Quite efficient at exhibiting hot spots.
On top of PR #11199
This uses the sse2neon.h header (MIT licensed) from https://github.com/DLTcollab/sse2neon that translates Intel SSEx intrinsincs to ARM Neon ones.
This accelerates GDALCopyWords(), overview/resampled RasterIO() and gdal_minmax_element.hpp
On the arm64 OSX github worker, this gives very substantial speeds up in gdal_minmax_element.hpp: ~ 30x in the uint8 case, ~ 7x in the float case and ~ 3x in double case