GlacioHack / xdem

Analysis of digital elevation models (DEMs)
https://xdem.readthedocs.io/
MIT License
123 stars 37 forks source link

Definitive fix of negative uncertainties from linear extrapolation #446

Closed MatteaE closed 7 months ago

MatteaE commented 8 months ago
rhugonnet commented 8 months ago

Amazing, thanks @MatteaE!

I had to trigger the test workflows (because it's your first time contributing, then it does it automatically): All tests passing :smile:! (except a Windows one that fails due to different random sampling between OSs from a recent PR, but I'll fix that separately)

We should add a check related to your PR in test_interp_nd_binning_artificial_data: https://github.com/GlacioHack/xdem/blob/main/tests/test_spatialstats.py#L148. Do you think you can reproduce this issue with some artificial data there, which would fail with current xDEM and pass with your PR?

MatteaE commented 8 months ago

Cool, thanks a lot @rhugonnet! I've tried to reproduce the issue with artificial data, but so far I haven't managed to (I find the result of the N-D binning somewhat hard to predict, and the problem with interpolation is quite an edge case). Can the test include some real data as an alternative?

rhugonnet commented 8 months ago

Mmhh.. introducing a new dataset through xdem-data is quite a bit of work.

Maybe a solution in this case would be to simply copy the binning dataframe you had with your real data that triggered the issue in interp_nd_binning directly. This should only be a couple lines of real values to write in the code (and could potentially help us craft an artificial example if we understand the pattern that triggered the issue).

MatteaE commented 7 months ago

Ok good point, here is the binning data frame which triggers the error - let me know if it is enough!

df_negative_err.csv

rhugonnet commented 7 months ago

Thanks! I also need the locations where it gets negative, I couldn't find it by searching around:

import xdem
import pandas as pd

fn_csv = "/home/atom/downloads/df_negative_err.csv"
df = pd.read_csv(fn_csv)

fn = xdem.spatialstats.interp_nd_binning(df, list_var_names=["slope", "maxc"])

print(fn((0, 0)))
print(fn((48, 2)))
print(fn((100, 50)))
print(fn((100, 0)))
print(fn((0, 50)))
2.0653602539062508
10.163194042968751
5468.51238649639
8.765966447451888
8.574538916015399
MatteaE commented 7 months ago

Here it is, sorted from most to least negative: df_neg_sorted.csv Numbers are computed with

fn = xdem.spatialstats.interp_nd_binning(df, list_var_names=["slope", "maxc"],  statistic="nmad", min_count=30)
zscores, dh_err_fun = xdem.spatialstats.two_step_standardization(
    dh_arr, list_var=[slope_arr, maxc_arr], unscaled_error_fun=unscaled_dh_err_fun
)
dh_err = dh.copy(new_array=dh_err_fun((slope.data, maxc.data)))

(Probably a different value of min_count produces a different dh_err_fun and the locations would also change)

rhugonnet commented 7 months ago

Using your data I managed to reproduce the error in an artificial data example, and added it to the tests. Was also the occasion to look at the behaviour in 2/3D, and the fix looks durable. Thanks a lot! Will merge as soon as tests finish running and pass.