Questions about the methodology

huard commented 11 months ago

Hi,

I'm confused by the computation of the likelihood in the code. If I understand correctly, it's value is the pdf of the gamma distribution evaluated at the weighted mean of the ECS for a given set of weights. This works if the gamma describes the "mean ECS", but reading the paper I had the impression the gamma represented the ECS distribution, not the distribution of the "mean ECS of a random sample". I don't see how the synthetic ECS were generated in the code, so I'm a bit lost.

Also, the paper states

Therefore, the likelihood function becomes proportional to the difference between the BMA generated ECS value and the target ECS distribution, i.e., ...

If the likelihood is proportional to the difference, then the likelihood is null when there is no difference ? Seems the other way around. Also, the likelihood doesn't seem to be computed from a difference between two likelihoods in the code.

Finally, I'm not sure using the word "model independence" is the right terminology here. Model independence for me relates to how two models implement climate processes. Do they share the same physics, ocean dynamics, radiation code, etc? If I understood correctly, the independence computed in the paper is essentially related to how close their ECS values are. The fact that two independent models yield the same ECS should be interpreted differently than the fact that two closely related models yield the same ECS, and this nuance is lost here.

Thanks,

David

EliasMassoud1 commented 11 months ago

Hi David,

Thanks for your comments. Explanations are provided below to hopefully provide some clarification.

The likelihood in this work is constructed to portray the 'distance' between the assessed ECS distribution (the 'true ECS') and the ECS generated by the BMA model average sample. There is a gamma distribution that is used to define the assessed 'true' ECS distribution (2.5-4 degrees C with a peak at 3 degrees). This gamma distribution is then used to estimate the probability of each model average-estimated ECS value. The farther the value of this model average-estimated ECS is from the 'peak' of the gamma distribution, the lower its probability. This explains how the likelihood function becomes 'proportional' to the difference between the BMA generated ECS value and the target ECS distribution. Therefore, to answer your question about the likelihood being null where there is no difference, it is in fact the opposite in this implementation. That is, the likelihood function provides a peak probability value where there is no difference rather than a null value. I understand how this could be misleading though.

As for the model independence calculation, there are many ways in which model independence can be defined and calculated. For instance, one definition of model dependence refers to the way a certain model is constructed and how its formulation relates to other models (e.g. 2 models that are constructed using the same parent code can be perceived as dependent on each other). Another definition of model dependence is to measure the distance in the outputs between 2 models (e.g. the RMSE or difference between 2 models). In that case, 2 models with low RMSE between each other are considered to be dependent on each other because their outputs are similar, and conversely 2 models that have high RMSE between each other are independent because their outputs are different. This is similar to what I believe you mentioned in your comments above, and I would like to clarify this is not what we are doing in this work.

For this work, the definition of independence is to quantify the level of dependence seen in the POSTERIOR BMA samples. In other words, we use information from the posterior weights to quantify the correlation between the BMA weights that any 2 models receive. Therefore, if there is any trade-offs between 2 models (meaning if Model A gets a higher weight when Model B gets a lower weight, or if they both get higher weights simultaneously), our measure of independence can capture this information. In essence, this approach is different than using model genealogy (e.g. parent code or similar forcings) and is different than using information from the model outputs space (e.g. using a metric like inter-model RMSE).

I hope this clarifies things. Let me know if you have any other questions.

Best regards, Elias

huard commented 11 months ago

Thanks Elias for your answers, and please allow me to follow up.

On line 54,
b_BMA(:,i) = A*x_pars(i,:)'; b_BMA is the weighted average of the ECS values, correct ?

Then the likelihood of the weighted average is computed using ECS_lik_BMA(i) = pdf('gam', b_BMA(:,i) , pd.a,pd.b);

So are pd.a and pd.b parameters describing the probability of ECS values, or the probability of the mean ECS value from a sample ?

EliasMassoud1 commented 11 months ago

No problem David.

The pd.a and pd.b are the parameters to define the gamma distribution that represents the true ECS value. The b_BMA(:,i) is the i'th sample of the BMA generated model average for ECS. This value (b_BMA(:,i)) is used as an input to the pdf function to estimate the probability of that BMA sample being true, given the pre-defined gamma distribution that is created using the pd.a and pd.b parameters.

Hope that makes sense.

-Elias

huard commented 11 months ago

Got it, so my claim is that the "joint probability of the individual ECS values" is not equal to the "probability of the mean ECS".

If you had a sample of 1000 ECS values drawn from your gamma distribution, you would not expect the weighted mean of that sample to follow the same gamma pdf.

EliasMassoud1 / BMA_ECS

Questions about the methodology #1