Almost zero interaction effect for mean error (bias) metric with delta MIM

TobiasKAndersen commented 1 month ago

Hi all,

My colleagues and I are a bit confounded by our SA results, and we hope that this great community could help provide some clarifications.

During a multi-model comparison project for four 1D lake hydrodynamic models on 73 lakes, we performed a SA as well based on sampling with Latin Hybercube and analysis with Delta moment independent-method, all via SALib. In the SA, we characterized influential parameters on several different model performance metrics for instance NSE, R, RMSE and mean error (bias). In addition, we calculated parameter interaction measure (S_inter = 1 - sum of S1, as recommended in Borgonovo et al., 2017).

Curiously, all interaction measures calculated for mean error (bias) across all models and lakes were close to zero. In contrast, we saw variation in interaction measures between other performance metrics and between hydrodynamic models (Fig 6). Fig6

How can this be?
Is it due to the nature of mean error? Or do we have an error in our approach?

All suggestions and thoughts are very welcome!

tupui commented 1 month ago

Hi 👋

To help analyze your case, could you provide 2D subplots of inputs vs output? Pairplot from eg. seaborn. That would help visually understand what could be happening.

TobiasKAndersen commented 4 weeks ago

Of course, thanks @tupui

We are working with four hydrodynamic models: FLake, GLM, GOTM and Simstrat for +70 lakes. Here is just pairplots for Lake Stechlin for each model. X-axis holds model parameters and y-axis have performance metrics. Pairplot title is model and lake name.

Was this what you had in mind?

stechlin_pairplot-Simstrat stechlin_pairplot-FLake stechlin_pairplot-GLM stechlin_pairplot-GOTM

TobiasKAndersen commented 4 weeks ago

And here is for a "warm lake", Lake Kivu, where the interaction for the model FLake is high.

Kivu_pairplot-FLake Kivu_pairplot-GLM Kivu_pairplot-GOTM Kivu_pairplot-Simstrat

ConnectedSystems commented 4 weeks ago

Hi @TobiasKAndersen

Just wondering, how is mean error being calculated here for your whisker/barplot? Through a package or your own code?

As far as I can tell the barplots don't really align with what the pairplots are showing.

Minor general comment on the figures: I find minimising the white edge/stroke colour of each point and making the fill colour more transparent helps with legibility.

TobiasKAndersen commented 4 weeks ago

Hi @ConnectedSystems

Thanks for your feedback!

mean error was calculated in R (by colleagues) with the function from the R package LakeEnsemblR as bias <- mean((P - O), na.rm = TRUE)

So model predicted values (P) against observed values (all water temperature).

Interaction term (S_inter) was calculated in Python by sum of all S1 per model, lake and performance metric (var) (following Borgonovo et al, 2017): sa_inter = sa_clean.groupby(by=['model', 'lake', 'var'], as_index=False)['S1'].sum() sa_inter['S_inter'] = 1- sa_inter['S1']

We include 71 lakes in total in this study (classified into the different lake "groups" e.g. warm lakes which incl 7 lakes).

Thanks, here is the revised pairplots ;)

Kivu_pairplot-Simstrat Kivu_pairplot-FLake Kivu_pairplot-GLM Kivu_pairplot-GOTM

tupui commented 3 weeks ago

X-axis holds model parameters and y-axis have performance metrics

So to be sure, you want to compute indices on the performance metrics? i.e. answer the question, which of my model parameters contribute to the variance of e.g. bias?

If this is the case, looking at the last plot only, the metrics seem to give a somewhat similar information besides some sign change. For R on turb and wind notably there is some heteroscedasticity.

Getting Sobol' indices on the figures, I would expect to see high values for wind (except for R) and swr. And low values for the rest.

Does that help?

TobiasKAndersen commented 1 week ago

Thank you, @tupui !

Yes, we are interested in answering: Which of my model parameters contribute to the variance of e.g. bias? Our results also match with your expectations (based on the plot above).

What confuses us is that when we sum Sobol first-order (S1) indices for each lake (per model and performance metric), sum of S1 is always approx 1.0 for bias whereas for the other performance metrics sum of S1 varies between 0.6 to 1.0.

Why is the sum of first-order Sobol always 1 for only bias?

tupui commented 1 week ago

The sum of S1 is at most 1 but if there are higher order effects, the value would be lower as "missing some explained variance". I suppose that some of your quantity of interest are more sensitive to that and this is why you would see a difference. (Assuming indices are converged)

TobiasKAndersen commented 1 week ago

Good, that confirms my understanding as well. I just can't figure out why bias have much lower interactions between parameters (higher order effects), then for instance R or NSE in our dataset (see figure in first comment)?

tupui commented 1 week ago

The bias could just be too lossy for your cases.

TobiasKAndersen commented 1 week ago

Could I test for that? Or just assume not the best metric for this?

tupui commented 1 week ago

I really don't know 😅

You could try to ask on the SA forum otherwise: https://discord.gg/gxE5btaYYj

TobiasKAndersen commented 1 week ago

Haha :D Thanks, I will try that! Appreciate the help!

ConnectedSystems commented 1 week ago

Hi @TobiasKAndersen

I want to clarify something.

You said earlier:

Curiously, all interaction measures calculated for mean error (bias) across all models and lakes were close to zero

Where the interaction terms is sum of S1

But above you say:

Why is the sum of first-order Sobol always 1 for only bias?

Maybe I misunderstood and lost context over the past days of this conversation (and reading on mobile early in the morning doesn't help either! 😅 )

Regardless, I think it's just the nature of the bias metric and your model. Your results if I understand correctly (mean error close to 0, sum of S1 close to 1) tells me the model is "well-behaved" with additive properties on average, but the other metrics indicate there are other interesting dynamics going on (and pairplots pretty much confirms this). Model behaviour that bias will not represent well.

You have MAE/nMAE in your pair plots which is missing in your initial figure. It might be worth including it in your barplot?

TobiasKAndersen commented 1 week ago

Sorry about the confusion: We define the interaction as S_inter = 1 - sum of S1 (as in Borgonovo et al, 2017 and Saltelli et al, 2000).

Thanks. Yeah, so based on our results, if using bias as a metric of performance one will often get a "well-behaved" model with additive properties, whereas using other performance metrics to evaluate model behavior will capture other dynamics with interaction effects between parameters.

Results for MAE/nMAE is similar to RMSE and so was excluded in our paper: Learning from a large-scale calibration effort of multiple lake models

Thanks for all the comments, much appreciated! 🙌

tupui commented 1 week ago

I will close the issue for now as I think the discussion is sort of resolved. (Do feel free to add more, we still get notifications. It's just in an effort to clean the bug tracker.)

SALib / SALib

Almost zero interaction effect for mean error (bias) metric with delta MIM #632