MAAP-Project / get-dem

A wrapper around `https://github.com/scottstanie/sardem` for use with the MAAP project.
Apache License 2.0
0 stars 1 forks source link

Pinpoint precise dependency differences that caused output difference between NASA and ESA #11

Open chuckwondo opened 4 months ago

chuckwondo commented 4 months ago

The solution for #7 will be to unify dependency management between NASA and ESA, but we still want to know precisely which dependency(ies) caused the difference in outputs so we have a proper understanding of what specifically would cause such an issue, whether it be some change in underlying floating-point handling/precision, or otherwise.

chuckwondo commented 4 months ago

@jsignell, when you're ready to dig into this, let me know. We can jump on a quick call so I can quickly orient you so you don't have to spend much time trying to decipher things on your own.

jsignell commented 4 months ago

Thanks for the ping. I will probably not get to this until a week or two from now.

chuckwondo commented 4 months ago

Thanks for the ping. I will probably not get to this until a week or two from now.

Cool. No rush.

jsignell commented 3 months ago

There are three possible explanations as far as I can tell: 1) the resolved environments are different enough that the results end up being substantially different 2) the results are different, but not meaningfully so (they are similar within a certain tolerance threshold) 3) the inputs are getting passed in differently (meaning the libraries themselves might not differ, just the input mechanism)

To try to get a better sense of what is going on I created the initial environments using pip+docker for ESA and mamba for NASA. The resolved environments appear fairly similar. The most interesting difference is the nvidia packages and Pillow but I haven't dug into whether or not those are being used.

Package NASA - conda ESA - pip+docker
Brotli 1.1.0 ---
certifi 2024.6.2 2024.6.2
charset-normalizer 3.3.2 3.3.2
cloudpickle 3.0.0 3.0.0
Cython 3.0.10 ---
GDAL 3.8.5 3.8.5
idna 3.7 3.7
Jinja2 3.1.4 3.1.4
markdown-it-py 3.0.0 3.0.0
MarkupSafe 2.1.5 2.1.5
mdurl 0.1.2 0.1.2
numpy 1.26.4 1.26.4
nvidia-ml-py --- 12.555.43
nvidia-ml-py3 --- 7.352.0
Pillow --- 9.0.1
pip 24.0 22.0.2
psutil 5.9.8 5.9.8
Pygments 2.18.0 2.18.0
pynvml 11.4.1 ---
PySocks 1.7.1 ---
requests 2.32.3 2.32.3
rich 13.7.1 13.7.1
sardem 0.11.3 0.11.3
scalene 1.5.38 1.5.42
setuptools 70.0.0 59.6.0
typing_extensions 4.12.2 ---
urllib3 2.2.1 2.2.2
wheel 0.43.0 0.37.1

Next I loaded the tifs in numpy and ran allclose. They are indeed substantially different:

import numpy as np
from osgeo import gdal        

esa = np.array(gdal.Open('./output/esa/dem.tif').ReadAsArray())
nasa = np.array(gdal.Open('./output/dem.tif').ReadAsArray())

np.allclose(esa, nasa)  # False

I am still working on trying to pare down the run scripts to see if the inputs are getting passed in differently or something.