Discussion regarding `errorVals`-`absoluteError`-`relativeError`

prisae commented 1 week ago

Data error computation

This is partly an issue (problematic default behaviour), and partly a discussion (error calculation).

Issue: Problematic default behaviour

In pygimli.frameworks.Inversion().run() and pygimli.frameworks.MarquardtInversion().run() the parameter errorVals is subtly deprecated in favour of absoluteError and relativeError.

If errorVals is not given, it is calculated via:

if errorVals is None:  # use absoluteError and/or relativeError instead
    absErr = kwargs.pop("absoluteError", 0)
    relErr = kwargs.pop("relativeError",
                        0.01 if np.allclose(absErr, 0) else 0)
    errorVals = pg.abs(absErr / np.asarray(dataVals)) + relErr

In some methods, such as marine CSEM, your data can have very small values, in the range of 1e-10 and less. This will result in np.allclose(absErr, 0) = True and hence give it a relErr = 0.01, which is useless (because the default atol of np.allclose is 1e-08).

What about either:

    relErr = kwargs.pop("relativeError",
                        0.01 if np.allclose(absErr, 0, atol=0) else 0)

or (my preference), not defaulting at all, but raise an error,

    relErr = kwargs.pop("relativeError", 0)
    if np.any(np.isclose(absErr + relErr, 0, atol=0)):
        raise Error that at least one of abs/rel error must be provided

Discussion: Error calculation

If the user provides absolute and/or relative errors, the error values are computed as follows

errorVals = pg.abs(absErr / np.asarray(dataVals)) + relErr

However, given error propagation, I think it should be

errorVals = np.sqrt(pg.abs(absErr / np.asarray(dataVals))**2 + relErr**2)

This will mostly not have a big impact at all, only in the zone where the data values approach the noise level.

prisae commented 1 week ago

I thought I'd discuss both here before making a PR.

halbmy commented 4 days ago

Very good point that did not pop up before. Marine csem might indeed scale badly. Throwing an error is the better choice.

However, I disagree with the error propagation. The relative and absolute errors are not independent error sources, but simple models to estimate errors, e.g. from reciprocal analysis etc. which is common practise and simple of relative or absolute errors are considered are the main source (and small values to catch near zero data). A more rigorous error class could help supporting and analysing data and error statistics.

prisae commented 4 days ago

I'm OK to agree on disagreeing regarding noise. In this case, an error class might be indeed good, or at least not deprecating errorVals, so the user can provided directly the error values instead of absError and relError.

gimli-org / gimli