Assignign weights (weighted least squares) and taking into account heteroscedasticity

ThisIsLorenzo / PFAS_Trophic_Magnification

0 stars 0 forks source link

Assignign weights (weighted least squares) and taking into account heteroscedasticity #2

Open ThisIsLorenzo opened 4 months ago

ThisIsLorenzo commented 4 months ago

https://github.com/ThisIsLorenzo/PFAS_Trophic_Magnification/blob/a8aeec0edf5d7e5ece631c192ab868798dfe510d/R/code.Rmd#L188

If I have standard deviations, I am estimating weights using formula 1/sd^2.

Cara_2022_loc4_PFOA_weights <- 1 / (Cara_2022_loc4$PFOA_sd^2) Cara_2022_loc4_lm_model_PFOA <- lm(PFOA_mean ~ TL_mean, data = Cara_2022_loc4, weights = Cara_2022_loc4_PFOA_weights)

Then, I am checking the residuals through visual inspection. If I see there might be heteroscedasticity like in this case:

I'm using the coeftest() function with a heteroscedasticity-consistent covariance matrix (vcov. = vcovHC) on top of the weights. coeftest(Cara_2022_loc4_lm_model_PFOA, vcov. = vcovHC)

If SDs are not provided, I only use the coeftest() function. This means that if I have means and SEs, I am not using the SEs.

@itchyshin

ThisIsLorenzo commented 4 months ago

Update:

If we have SE values, we can use the inverse of SE as weight (1/SE).
If we have SD values, we can use the inverse of squared SD (1/SD^2).
Checking the distribution of residual to visually assess heteroscedasticity makes sense, and it is good practice. It is good practice also account for heteroscedasticity if present using robust() or coeftest(). However, in our case, we should not do that if we want to be consistent with what the other studies did to measure the slope and its error.

itchyshin commented 4 months ago

@ThisIsLorenzo

Use SE^2 or 1/SE^2. Both SE^2 and SD^2 are variance

ThisIsLorenzo commented 4 months ago

@ThisIsLorenzo

Use SE^2 or 1/SE^2. Both SE^2 and SD^2 are variance

I was just talking about this with Yefeng. I understand now. Thank for the reply.

ThisIsLorenzo commented 4 months ago

It seems like using standard errors as weights in one of our linear models is causing problems (I checked and it seems working fine in models for other studies) @itchyshin .

I noticed that the model provides a high slope value for data that should give an almost flat slope. This is the plot:

And this is the code (weights are assigned as 1/SE^2):

As you can see from the plot the slope is almost flat. However, the model gives a -3 slope, which does not make sense. Also the p-value (0.037) does not make sense. Also look at the R-squared value! That's definitely wrong.

This is the model result if I remove weights from the model:

This model result makes much more sense.

Even more strange, if I use weights = SE^2 instead of 1/SE^2 this is the model result:

A positive slope! What's going on! @_@ this is really not clear to me.

The only slope value that makes sense here is that one without weights. @itchyshin, what do you think?

itchyshin commented 4 months ago

-3 looks fine to me, which I will explain to you when I see you @ThisIsLorenzo - for now, we can leave this and move on (it is good that you are creating an issue for this).