Open fmaussion opened 2 years ago
Additional info: the reason I'm reporting here and not on rasterio is that in our docker environments (built without wheels and with pip) everything works as expected.
Additional image where one can see that the output with conda is not OK, like rugged:
@fmaussion, thanks for reporting this and for the script to reproduce the problem. I will try to look into it this weekend.
If you want to try to narrow it down in the meantime, one option would be to install older builds of rasterio 1.2.10 (e.g. conda create -y -n test "rasterio=1.2.10=*_0" ...
) and see when the problem emerges. This is what I will try first. You can look here:
https://github.com/conda-forge/rasterio-feedstock/pulls?q=is%3Apr+is%3Aclosed
to get a sense of what has changed in each build since the 1.2.10 release (you likely will need to look at the individual PRs to see which build number corresponds to which change).
If you don't have the time or expertise to do this, I'll see when I can get to it.
it seems unlikely that this is ultimately a conda-forge problem but we should be able to track down which dependency is causing trouble so you can pursue things further with them (or with rasterio).
@xylar thanks for having a look. I did check that downgrading to rasterio 1.2.9 solves the problem.
To simplify reproducing the problem I have created a est environment with two env.yml
files and the reference data in it, you can download it here: rioproblem.zip (a few kb)
Here is are the outcome in a few commands:
$ mamba env create -f env_latest.yml
$ conda activate test_env_latest
$ python script.py
Traceback (most recent call last):
File "/home/mowglie/tmp/rioproblem/script.py", line 44, in <module>
np.testing.assert_allclose(ref, this, atol=1)
File "/home/mowglie/.miniconda3/envs/test_env_latest/lib/python3.9/site-packages/numpy/testing/_private/utils.py", line 1530, in assert_allclose
assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
File "/home/mowglie/.miniconda3/envs/test_env_latest/lib/python3.9/site-packages/numpy/testing/_private/utils.py", line 844, in assert_array_compare
raise AssertionError(msg)
AssertionError:
Not equal to tolerance rtol=1e-07, atol=1
Mismatched elements: 15466 / 18126 (85.3%)
Max absolute difference: 65
Max relative difference: 0.02263231
x: array([[2596, 2595, 2610, ..., 2583, 2568, 2555],
[2565, 2569, 2582, ..., 2565, 2548, 2531],
[2539, 2547, 2557, ..., 2545, 2526, 2507],...
y: array([[2598, 2597, 2615, ..., 2584, 2566, 2549],
[2542, 2551, 2564, ..., 2550, 2526, 2502],
[2542, 2551, 2564, ..., 2550, 2526, 2502],...
$ conda deactivate
$ mamba env create -f env_down.yml
$ conda activate test_env_down
$ python script.py
(all good here)
it seems unlikely that this is ultimately a conda-forge problem but we should be able to track down which dependency is causing trouble so you can pursue things further with them (or with rasterio).
Yes I understand - yet again with latest rasterio and pip I see no problem, but it could be GDAL or some other lib indeed...
@fmaussion, can you try the same tests but with other yaml files that select other recent builds? You can look at https://anaconda.org/conda-forge/rasterio/files to see what the hashes and build numbers are for recent builds (make sure to get the linux-64
version).
@fmaussion, in case you aren't familiar with the difference, there are several builds for each version number (e.g. 1.2.10) and each may have different dependencies (different versions of geos, proj, etc.).
I've now tested this with all linux-64/rasterio-1.2.10_py39* versions that are on https://anaconda.org/conda-forge/rasterio/files. For all, the reproduction-script showed the assertion error. I then continued with all 1.2.9 versions, and they also all showed the error. But I'm pretty sure these have worked in the past, so maybe the issue is not directly caused by rasterio itself?
Sorry, correction: The reference file we transferred via Slack got messed up in the process (No idea what Slack is doing, but it breaks geotiff files sent via it...).
With the correct reference in place now, I can confirm that the error happens with rasterio-1.2.10-py39h2e4b6e6_5
.
After the following downgrade, the error is gone:
The following packages will be DOWNGRADED:
cfitsio 4.1.0-hd9d235c_0 --> 4.0.0-h9a35b8e_0
geotiff 1.7.1-h509b78c_0 --> 1.7.0-h6593c0a_6
libgdal 3.4.2-hdfc60d4_1 --> 3.4.1-hff5c5e8_5
libspatialite 5.0.1-ha867d66_15 --> 5.0.1-h0e567f8_14
proj 9.0.0-h93bde94_1 --> 8.2.1-h277dcde_0
rasterio 1.2.10-py39h2e4b6e6_5 --> 1.2.10-py39h0401cea_4
Oh, boy, that's a lot of packages changed. In theory, you'd need to try upgrading each of those packages one at a time. Not sure if that's possible.
My immediate suspicion is the major proj update. But it's not possible to do those updates individually, since there is no version of packages that's build against those combinations.
Okay, yeah, that's one of the difficulties with package management. A major update in proj
often requires a version update in all of the packages that have it as a dependency.
I'm afraid I don't have a good way to help here. You've done the detective work I can think of within conda-forge.
Some more detective work: I have just updated our entirely self compiled Docker-Image to the same proj and gdal versions (9.0.0 and 3.4.2 respectively) and everything works just fine there. That kinda only leaves the minor geotiff upgrade. In our case, we're using the internal bundled geotiff version that comes with gdal, which is version 1.7.0.
I now also tried to manually build and use libgeotiff version 1.7.1, and everything works great. So I'm out of ideas which part of that upgrade broke it, but it doesn't seem like it's an upstream issue of any of the libraries, but something related to conda/conda-forge.
Presumably that leave compiler settings or other configuration of one or another package on conda-forge. But I can't think of an easy way to track that down. I can double check if we can downgrade proj < 9 and still make any of the other upgrades. Typically, packages should first be updated and then be rebuilt with the new proj, rather than doing that in one PR so I would expect that would be possible but I can't be sure.
Anyway, I still don't have time to investigate this but it's pretty intriguing...
@fmaussion and @TimoRoth, I believe I've narrowed this to being caused by the update from geotiff
1.7.0 to 1.7.1.
The following (using your test script) works fine:
mamba create -y -n test_rasterio python=3.9 rasterio=1.2.10 proj=9.0.0 cfitsio=4.1.0 geotiff=1.7.0 libgdal=3.4.2
conda activate test_rasterio
python script.py
conda deactivate
The following fails the test:
mamba create -y -n test_rasterio python=3.9 rasterio=1.2.10 proj=9.0.0 cfitsio=4.1.0 geotiff=1.7.1 libgdal=3.4.2
conda activate test_rasterio
python script.py
conda deactivate
@akrherz has been working hard on consistently pinning geotiff (perhaps related to investigating this issue?) and built gdal with geotiff=1.7.0
just a few hours ago. Before that, it seems that packages were getting 1.7.0 or 1.7.1 somewhat arbitrarily as a dependency. That seems to have made debugging this issue a bit more challenging.
I think the fact that you were able to use a different (non-conda-forge) build of geotiff 1.7.1 successfully suggests a problem with that build on conda-forge. Maybe open an issue in that feedstock but feel free to keep this one open until we figure this out.
The pinning of geotiff is very tight, so I don't suspect the issue I am currently fixing there is related. You are right though, it makes debugging much more tricky as you get some strange combinations of builds due to which geotiff gets picked by conda at build time.
@akrherz, I agree. The problem is presumably with how 1.7.1 was built. If you happen to know anything about CMake and/or libgeotiff, maybe take a look at what changed in 1.7.1 and let us know if you have any suggestions for how to modify the build: https://github.com/OSGeo/libgeotiff/compare/1.7.0...1.7.1
I'm not very fluent in CMake but nothing jumps out at me.
I haven't made any headway on figuring this out. I'm not familiar enough with the reproject()
function (or with rasterio in general) to try to come up with a sensible way to figure out where the problem might be cropping up.
I'm afraid I've done all I can unless we can narrow the problem down further.
Solution to issue cannot be found in the documentation.
Issue
Here is a sample data file: hef_srtm.tif.zip (DEM data)
Here is a sample script:
Sorry I couldn't pin this down to a simpler snippet. This code resamples the data into a new map. It is taken out of the OGGM test suite ehich stated erroring on conda environments recently (date unknown).
The expected output is: outf_1.2.10_pip.tif.zip
When run on a fresh conda install in linux, the output is different (shifted with large absolute errors: +- 60 meters, tested on two different linux machines):
On conda, downgrading to rasterio 1.2.9 solves the problem. (but its hard to tell what the reason is, because this downgrades
libgdal
andproj
as well.cc @TimoRoth FYI
Installed packages
Environment info