Pinpoint precise dependency differences that caused output difference between NASA and ESA

chuckwondo commented 4 months ago

The solution for #7 will be to unify dependency management between NASA and ESA, but we still want to know precisely which dependency(ies) caused the difference in outputs so we have a proper understanding of what specifically would cause such an issue, whether it be some change in underlying floating-point handling/precision, or otherwise.

chuckwondo commented 4 months ago

@jsignell, when you're ready to dig into this, let me know. We can jump on a quick call so I can quickly orient you so you don't have to spend much time trying to decipher things on your own.

jsignell commented 4 months ago

Thanks for the ping. I will probably not get to this until a week or two from now.

chuckwondo commented 4 months ago

Thanks for the ping. I will probably not get to this until a week or two from now.

Cool. No rush.

jsignell commented 3 months ago

There are three possible explanations as far as I can tell: 1) the resolved environments are different enough that the results end up being substantially different 2) the results are different, but not meaningfully so (they are similar within a certain tolerance threshold) 3) the inputs are getting passed in differently (meaning the libraries themselves might not differ, just the input mechanism)

To try to get a better sense of what is going on I created the initial environments using pip+docker for ESA and mamba for NASA. The resolved environments appear fairly similar. The most interesting difference is the nvidia packages and Pillow but I haven't dug into whether or not those are being used.

Package	NASA - conda	ESA - pip+docker
Brotli	1.1.0	---
certifi	2024.6.2	2024.6.2
charset-normalizer	3.3.2	3.3.2
cloudpickle	3.0.0	3.0.0
Cython	3.0.10	---
GDAL	3.8.5	3.8.5
idna	3.7	3.7
Jinja2	3.1.4	3.1.4
markdown-it-py	3.0.0	3.0.0
MarkupSafe	2.1.5	2.1.5
mdurl	0.1.2	0.1.2
numpy	1.26.4	1.26.4
nvidia-ml-py	---	12.555.43
nvidia-ml-py3	---	7.352.0
Pillow	---	9.0.1
pip	24.0	22.0.2
psutil	5.9.8	5.9.8
Pygments	2.18.0	2.18.0
pynvml	11.4.1	---
PySocks	1.7.1	---
requests	2.32.3	2.32.3
rich	13.7.1	13.7.1
sardem	0.11.3	0.11.3
scalene	1.5.38	1.5.42
setuptools	70.0.0	59.6.0
typing_extensions	4.12.2	---
urllib3	2.2.1	2.2.2
wheel	0.43.0	0.37.1

Next I loaded the tifs in numpy and ran allclose. They are indeed substantially different:

import numpy as np
from osgeo import gdal        

esa = np.array(gdal.Open('./output/esa/dem.tif').ReadAsArray())
nasa = np.array(gdal.Open('./output/dem.tif').ReadAsArray())

np.allclose(esa, nasa)  # False

I am still working on trying to pare down the run scripts to see if the inputs are getting passed in differently or something.

MAAP-Project / get-dem

Pinpoint precise dependency differences that caused output difference between NASA and ESA #11