ESMValGroup / ESMValCore

ESMValCore: A community tool for pre-processing data from Earth system models in CMIP and running analysis scripts.
https://www.esmvaltool.org
Apache License 2.0
40 stars 36 forks source link

[Numpy2] Preprocessor `distance_metric` returns value 0 instead of `--` (masked element) with `numpy==2.0.0` for metrics that use `np.sqrt` or `da.sqrt` #2460

Open valeriupredoi opened 4 weeks ago

valeriupredoi commented 4 weeks ago

could be Scipy could be something else, this needs a bit of due dilligence :beer:

valeriupredoi commented 3 weeks ago

This is a bug introduced with Numpy=2.0: in https://github.com/ESMValGroup/ESMValCore/blob/d6eaac24e05310da95ddaee5df3208231c234fe8/esmvalcore/preprocessor/_compare_with_refs.py#L454

the sqrt is not computed correctly in the case of masked arrays: one needs to account both for Numpy and Dask:

    npx = get_array_module(squared_error)

    # need masked sqrt for numpy >=2.0
    # and dask.array.reductions.safe_sqrt for Dask
    # otherwise results will be computed ignoring masks
    if npx.__name__ == "dask.array":
        da_squared_error = npx.ma.average(squared_error,
                                          axis=axis,
                                          weights=weights)
        rmse = npx.reductions.safe_sqrt(da_squared_error)
    else:
        rmse = npx.ma.sqrt(
            npx.ma.average(squared_error, axis=axis, weights=weights)
        )

this is fixed in #2395

ie if np.sqrt() ignores masks then da.sqrt() will too, see https://github.com/numpy/numpy/issues/25635 that may be fixed, or not - the issue is pretty old now wrt Numpy2 timeline, so better we have the patch in our code