Closed RutgerK closed 1 year ago
There is discussion about fast and almost-as-accurate-as-possible mean "fmean" in https://bugs.python.org/issue35904. Out of curiosity, does the difference in the 12th decimal of the resolution make practical issues for you?
There is discussion about fast and almost-as-accurate-as-possible mean "fmean" in https://bugs.python.org/issue35904. Out of curiosity, does the difference in the 12th decimal of the resolution make practical issues for you?
In terms of results it wouldn't matter to me, in a case like this I might for example use the coordinates to calculate solar angles, for which such an accuracy isn't necessary at all.
It's more an issue (annoyance) in terms of workflow for me. I usually either religiously check that all spatial properties match before combining different sources, or throw an Exception. Or when I expect differences I might automatically pass it through gdal.Wap for example, which also isn't great for a case like this. And inheriting output properties from inputs like this causes it to potentially percolate to downstream outputs.
Now that I'm aware, I can override the default "average" for cases where I know upfront everything should be the same anyway, which is the bulk of my use cases. It took a while before I realized where it originated, which is part of posting it here, it could confirm others noticing similar results.
I appreciate the rabbit hole that's floating point precision, so if it's difficult to avoid in this case I can live with it. Thanks for the link, Raymond Hettinger has done some great work (and talks) on these topics.
I'm guessing this doesn't happen in Python (but C/C++), so it won't be as simple replacing a sum with fsum. 😅
The
gdal.Dataset
returned fromgdal.BuildVRT
changes both the x- and y-resolution based on the number of inputs provided, even if all inputs have the exact same resolution.Some things I have observed so far:
resolution="average"
, both "highest" and "lowest" avoid it.Library\bin\gdalbuildvrt.exe
)gdal.Dataset
, before writing the VRT (but it does also end up the in VRT).Based on the example below, using the standard sum function in Python seems to replicate it exactly:
assert gt_ds[1] == sum([reference_gt[1]]*n)/n
Expected behavior and actual behavior.
The resolution returned from
gdal.BuildVRT
should not change (for identical input resolutions).Steps to reproduce the problem.
The example below has two different geotransforms (one commented out) for which it both occurs. The geotransform of ~3000 meters has much larger differences compared to the ~3km one.
Both are real uses cases, being the geotransform for the original grid of Meteosat Second Generation, The example code below doesn't write any real data (didn't seem to matter), but an actual input could be downloaded from the url below:
https://msgcpp-adaguc.knmi.nl/adaguc-server?dataset=msgrt&service=wcs&request=getcoverage&coverage=lwe_precipitation_rate&FORMAT=NetCDF4&time=current
(which was also the source of the -0.1 nodata from #7486 btw)
Which for me outputs:
I checked that example for up to 1000 inputs, which results in the following deviations from the input:
Using the ~3km (KNMI) geotransform:
Using the ~3000m (Satpy) geotransform, note the different y-limits:
As mentioned above
sum([n_values])/n
replicates it, so it could be related to a summation taking place when calculating the average resolution. An example:IIRC Numpy uses pairwise summation internally. https://en.wikipedia.org/wiki/Kahan_summation_algorithm
Operating system
Windows 10 Pro (64bit) 22H2
GDAL version and provenance