Closed ninsbl closed 1 week ago
Do I have to look here: https://github.com/OSGeo/grass/blob/main/lib/raster/vrt.c#L47 Somewhere?
Ok. The documentation says VRTs can be build also over linked raster data with r.external. I tried now various GRASS GIS versions (7.8.8, 8.0.0, 8.2.1, all using docker) and all show the same performance issue with GDAL-linked data, especially on file systems with latency. The individual linked files are read quite fast, but combined in a VRT things get really slow...
@metzm do you have any idea if this could be fixed somehow, or is it a format limitation that we rather document in the manual?
I would be wiling to put down some effort here, but I lack C-skills and I would need some help to fix it; if possible at all...
According .r.buildvrt module man page Reading the whole VRT is slower than reading the equivalent single raster map. Only reading small parts of the VRT provides a performance benefit.
Thanks, @tmszi for looking into this. The performance difference is not related to reading parts vs. entire VRT, but related to VRT with GDAL linked data vs. VRT with native GRASS data... And on a fileaystem with some latency, GDAL linked data is practically unusable in VRTs... Even the 1000×1500 pixel in the example take so long to read that one could create a temporary patched raster first and the process would still be faster. Something seems wrong here....
The performance difference is not related to reading parts vs. entire VRT, but related to VRT with GDAL linked data vs. VRT with native GRASS data...
Does perhaps @rouault have a hint here?
Does perhaps @rouault have a hint here?
not really, I'm not familiar with what r.univar does. It would be best to try first to reproduce using only GDAL command line utilities, like gdal_translate
Thanks, @rouault ! Will do that. The problem is not specific to r.univar though. I just used it as an example. Any reading of GDAL linked data through GRASS GIS VRTs seems affected... So, my guess is the issue is with _Rast_get_vrtrow() https://github.com/OSGeo/grass/blob/2356520814d2ab272c308af9e89c3af466c13a13/lib/raster/vrt.c#L171
So, my guess is the issue is with _Rast_get_vrtrow()
Oooooh I now read in https://grass.osgeo.org/grass84/manuals/r.buildvrt.html that a "A GRASS virtual raster can be regarded as a simplified version of GDAL's virtual raster format" . So I'm mostly incompetent to comment on GRASS VRT specificities. What is likely is that GRASS VRT might perhaps lack is the functionality of having a pool of opened VRT sources like GDAL does, which saves opening&closing them when doing repeated pixel request in neighbouring windows of interest. Just guessing in the dark... Perhaps try to use a GDAL VRT of GRASS rasters... ?
That again, @rouault ! Sounds like a viable alternative / workaround! I will try that!
Attaching two strace visualisations.
One for reading VRTs with data in native GRASS GIS format:
And one for reading VRTs with data in linked-GDAL format:
In case that helps tracing down the issue...
In this particular case, there might be a mix of different reasons causing poor performance. The reasons here seem to be NFS + GDAL-linked raster maps + GRASS vrt, which in their combination might amplify performance degradation.
The two main reasons might be
GDALRasterIO()
via Rast_gdal_raster_IO()
in https://github.com/OSGeo/grass/blob/main/lib/raster/get_row.c#L205 which could be optimized by letting GDAL do the subsetting to the current regionThese two reasons combined with NFS could easily cause the observed performance degradation. In this case I suggest to create a GDAL VRT and link that into GRASS. However, the fastest method should be to have GRASS native rasters (maybe in a mapset on a NFS mount) and optionally build a GRASS vrt with the native GRASS rasters. As so often, it's a compromise between data duplication and IO optimization.
Thanks @metzm for your insights! Then I would suggest we close this issue once the known-issue for this corner case is documented in the manual.
Using GDAL VRTs for GDAL linked data works actually quite well, facilitated with: https://grass.osgeo.org/grass84/manuals/addons/r.buildvrt.gdal.html
Describe the bug
I am experiencing significant performance issues with virtual rasters build with r.buildvrt over GDAL-linked (r.external) raster maps (source is in GeoTiff format) on NFS. After more testing it seems the NFS file system amplifies the issue but but there are significant performance issues also on local file systems and also with raster maps in native GRASS format...
Running r.univar on two GDAL-linked raster maps that cover my computational region takes less than a second. Using the same computational region, one r.univar run on a virtual raster of the same two raster map is by orders of magnitude slower (30 seconds to minutes).
Below you find a script to run performance tests on different file systems and with different formats. While VRT maps with raster maps in native GRASS format are sometimes faster than r.external linked GeoTiffs, performance is way worse compared to r.univar on the individual raster maps (= no VRT). So it seems the issue is reading GDAL linked raster maps through GRASS VRTs.
In debug=2 mode I see waaaaay more calls to:
when running r.univar on a VRT compared to reading the same maps not going through VRT. That is probably the main root cause...
Hints on how to identify or find possible remedy in the code would be very welcome...
To reproduce
Expected behavior
VRT raster maps should be at least comparable in read performance
System description
version=8.3.1 date=2023 revision=exported build_date=2023-10-26 build_platform=x86_64-pc-linux-gnu build_off_t_size=8 libgis_revision=8.3.1 libgis_date=2023-10-26T09:06:16+00:00 proj=9.1.1 gdal=3.6.4 geos=3.11.1 sqlite=3.37.2
Additional context
GRASS GIS version: 8.5.dev behaves the same...