Closed MatteaE closed 7 months ago
Amazing, thanks @MatteaE!
I had to trigger the test workflows (because it's your first time contributing, then it does it automatically): All tests passing :smile:! (except a Windows one that fails due to different random sampling between OSs from a recent PR, but I'll fix that separately)
We should add a check related to your PR in test_interp_nd_binning_artificial_data
: https://github.com/GlacioHack/xdem/blob/main/tests/test_spatialstats.py#L148.
Do you think you can reproduce this issue with some artificial data there, which would fail with current xDEM and pass with your PR?
Cool, thanks a lot @rhugonnet! I've tried to reproduce the issue with artificial data, but so far I haven't managed to (I find the result of the N-D binning somewhat hard to predict, and the problem with interpolation is quite an edge case). Can the test include some real data as an alternative?
Mmhh.. introducing a new dataset through xdem-data is quite a bit of work.
Maybe a solution in this case would be to simply copy the binning dataframe you had with your real data that triggered the issue in interp_nd_binning directly. This should only be a couple lines of real values to write in the code (and could potentially help us craft an artificial example if we understand the pattern that triggered the issue).
Ok good point, here is the binning data frame which triggers the error - let me know if it is enough!
Thanks! I also need the locations where it gets negative, I couldn't find it by searching around:
import xdem
import pandas as pd
fn_csv = "/home/atom/downloads/df_negative_err.csv"
df = pd.read_csv(fn_csv)
fn = xdem.spatialstats.interp_nd_binning(df, list_var_names=["slope", "maxc"])
print(fn((0, 0)))
print(fn((48, 2)))
print(fn((100, 50)))
print(fn((100, 0)))
print(fn((0, 50)))
2.0653602539062508
10.163194042968751
5468.51238649639
8.765966447451888
8.574538916015399
Here it is, sorted from most to least negative: df_neg_sorted.csv Numbers are computed with
fn = xdem.spatialstats.interp_nd_binning(df, list_var_names=["slope", "maxc"], statistic="nmad", min_count=30)
zscores, dh_err_fun = xdem.spatialstats.two_step_standardization(
dh_arr, list_var=[slope_arr, maxc_arr], unscaled_error_fun=unscaled_dh_err_fun
)
dh_err = dh.copy(new_array=dh_err_fun((slope.data, maxc.data)))
(Probably a different value of min_count produces a different dh_err_fun and the locations would also change)
Using your data I managed to reproduce the error in an artificial data example, and added it to the tests. Was also the occasion to look at the behaviour in 2/3D, and the fix looks durable. Thanks a lot! Will merge as soon as tests finish running and pass.
Adds ability in fit_sum_model_variogram() to pass a value for maxfev to curve_fit: this can sometimes help for some convergence problems
EDIT by @rhugonnet:
pd.Dataframe
containingpd.Interval
written as strings.