Error in plotSimulatedResiduals for very few unique predictions

florianhartig commented 6 years ago

From a user: It's working fine most of the time but sometimes I get this error using: plotSimulatedResiduals

Error in qrnn::qrnn.fit(x = as.matrix(pred), y = as.matrix(res), n.hidden = 4, : zero variance column(s) in "x"

florianhartig commented 6 years ago

This error message comes from the quantile regression that plots these lines on the residuals, probably in some cases this functions fails. I will add some code to catch the error in the next version of the pacakge.

In the meantime, you can suppress the quantile regression by setting quantreg = F in the arguments of plotSimulatedResiduals.

mcauchoix commented 6 years ago

Thank you so much for your super quick reply. I tried that argument and the new error i get is: Error in smooth.spline(pred, res, df = 10) : need at least four unique 'x' values

florianhartig commented 6 years ago

hmm ... that seems weird, maybe more a problem of the models you are trying to plot. How many unique values do you have? can you do a

unique(predict(fittedModel))

If you really have <4 unique predictions (which could happen if you have only one categorical predictor), the plotSimulatedResiduals function won't work at the moment, but you can use

plotResiduals

specifying the x values by hand with as.factor(predict(fittedModel)) - if you specify the x as factor, the function will do a boxplot instead of the normal plot.

At the moment, I have no option implemented to do the same in the main function. If you want to do the quantile plot alone, just use

gap::qqunif(simulationOutput$scaledResiduals,pch=2,bty="n", logscale = F, col = "black", cex = 0.6, main = "QQ plot residuals", cex.main = 1)

If that is indeed the problem, I guess I could implement something that automatically realizes if you have very few unique predictions, and in this case switches the res vs pred plot to a categorical plot options, or at least suppresses the lines

florianhartig commented 6 years ago

If you don't mind and if that's possible, could you send me a fitted model (use the save function) so that I can have a try myself? Note that the data will be attached to this object, not sure if that's an issue.

mcauchoix commented 6 years ago

Thanks! I will try that. Attached the fitted model.

Browse[2]> unique(predict(Rn_Mod)) [1] 0.395069351 -0.274755256 -0.129444404 0.135819345 [5] -0.471134318 -0.136551554 -0.482839849 -0.244911294 [9] -0.004249219 0.394139831 0.424494156 -0.297700412 [13] -0.127675511 0.223951042 0.292125137 -0.065351270 [17] 0.425362814 -0.200097983 -0.191881019 -0.034607072 [21] 0.315443759 -0.025441888 -0.111515551 -0.412508223 [25] 0.049440135 0.554819254 0.395069351 -0.274755256 [29] -0.129444404 0.135819345 -0.471134318 -0.136551554 [33] -0.482839849 -0.244911294 -0.004249219 0.394139831 [37] 0.424494156 -0.297700412 -0.127675511 0.223951042 [41] 0.292125137 -0.065351270 0.425362814 -0.200097983 [45] -0.191881019 -0.034607072 0.315443759 -0.025441888 [49] -0.111515551 -0.412508223 0.049440135 0.554819254

Maxime Cauchoix PhD, Station d’écologie experimentale du CNRS à Moulis 07 85 23 51 43

???! (°v°) ? ! ... (O) (°v°) (°v°) (°v°) II \ #############

2018-01-31 13:56 GMT+01:00 Florian Hartig notifications@github.com:

If you don't mind and if that's possible, could you send me a fitted model (use the save function) so that I can have a try myself? Note that the data will be attached to this object, not sure if that's an issue.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/florianhartig/DHARMa/issues/42#issuecomment-361924180, or mute the thread https://github.com/notifications/unsubscribe-auth/AAsYfJm0oZdcxO96Xx6WjHYZx86GI-0oks5tQGMSgaJpZM4Rz2lv .

florianhartig commented 6 years ago

Hi Maxime - OK, I have looked at this. As you see above, in principle, you have enough variation in the response. So, a naive residual plot

res = simulateResiduals(Rn_Mod)
plotResiduals(predict(Rn_Mod), res$scaledResiduals)

would work. However, if you try this, you will see that this plot shows a pattern. This is not because of a problem in your model, but because your variation in x stems mostly from the random effect in the model, which you could view as a kind of residual as well, so you are plotting res against residual. Therefore, the DHARMa default plot plots only the fixed effect predictions on x. You can emulate this via

plotResiduals(predict(Rn_Mod, re.form = ~0), res$scaledResiduals)

Now, we have only two values for the predictions, and this is what seems to have caused your problems. Curiously, for me the fitted splines / quantile regressions never produced an error, so I don't know if this is a problem specific to your R / package version - I'll have to write a test to see if this is somehow platform-dependent.

Anyway, this is the reason for the problems. The quick fix for you would be to do the plots by hand, and convert the predictions as fact, then DHARMa will produce boxplots instead of the scatter plots, so you can do

plotResiduals(as.factor(predict(Rn_Mod, re.form = ~0)), res$scaledResiduals)

and the qq plot via

gap::qqunif(res$scaledResiduals,pch=2,bty="n", logscale = F, col = "black", cex = 0.6, main = "QQ plot residuals", cex.main = 1)

I will try to implement some kind of fix in the next package version, I'm just not sure yet if I should catch the errors or rather switch the plots if there are only a few unique values for pred.

Will leave this ticket open until this is done

mcauchoix commented 6 years ago

Thank you Florian. I had a look into smooth.line: The x vector should contain at least four distinct values. ‘Distinct’ here is controlled by tol: values which are regarded as the same are replaced by the first of their values and the corresponding y and w are pooled accordingly.

defaults to 1e-4 (formerly 1e-3).

Maybe it should be adapted to x range?

Thanks again!

Maxime Cauchoix PhD, Station d’écologie experimentale du CNRS à Moulis 07 85 23 51 43

???! (°v°) ? ! ... (O) (°v°) (°v°) (°v°) II \ #############

2018-01-31 21:54 GMT+01:00 Florian Hartig notifications@github.com:

Hi Maxime - OK, I have looked at this. As you see above, in principle, you have enough variation in the response. So, a naive residual plot

res = simulateResiduals(Rn_Mod) plotResiduals(predict(Rn_Mod), res$scaledResiduals)

would work. However, if you try this, you will see that this plot shows a pattern. This is not because of a problem in your model, but because your variation in x stems mostly from the random effect in the model, which you could view as a kind of residual as well, so you are plotting res against residual. Therefore, the DHARMa default plot plots only the fixed effect predictions on x. You can emulate this via

plotResiduals(predict(Rn_Mod, re.form = ~0), res$scaledResiduals)

Now, we have only two values for the predictions, and this is what seems to have caused your problems. Curiously, for me the fitted splines / quantile regressions never produced an error, so I don't know if this is a problem specific to your R / package version - I'll have to write a test to see if this is somehow platform-dependent.

Anyway, this is the reason for the problems. The quick fix for you would be to do the plots by hand, and convert the predictions as fact, then DHARMa will produce boxplots instead of the scatter plots, so you can do

plotResiduals(as.factor(predict(Rn_Mod, re.form = ~0)), res$scaledResiduals)

and the qq plot via

gap::qqunif(res$scaledResiduals,pch=2,bty="n", logscale = F, col = "black", cex = 0.6, main = "QQ plot residuals", cex.main = 1)

I will try to implement some kind of fix in the next package version, I'm just not sure yet if I should catch the errors or rather switch the plots if there are only a few unique values for pred.

Will leave this ticket open until this is done

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/florianhartig/DHARMa/issues/42#issuecomment-362067188, or mute the thread https://github.com/notifications/unsubscribe-auth/AAsYfF-k8HD7G9Ic9BuBRqHthym2-gBgks5tQNMDgaJpZM4Rz2lv .

florianhartig commented 6 years ago

sorry, I get the error of smooth.line, but the default plot with the quantile regression works fine. In any case, something has to be done, I'm just not sure yet whether to suppress the lines or switch to the boxplot if the unique values are <4 or so.

mcauchoix commented 6 years ago

Switching to boxplot would be great! I can test it for you if want to update that ;-)

mcauchoix commented 6 years ago

I'm running nearly 300 models on different datasets with all kind of distribution for a meta-analysis so it might be useful to test the generality of the code.

florianhartig commented 6 years ago

OK, I have introduced error catching in the plot function, so at least this suppresses the error so that your script doesn't stop. It will take a bit until this is pushed to CRAN, but you can get this feature already now by installing the development version of DHARMa from GitHub, see https://github.com/florianhartig/DHARMa.

Of course, there are also some other changes in the development version - if you just want to get the plot function, you can load DHARMa as usual, and then overwrite the plot functions with the current development version via source("https://raw.githubusercontent.com/florianhartig/DHARMa/master/DHARMa/R/plotResiduals.R")

florianhartig commented 6 years ago

Todo:

Write unit test to test this problem
Decide on final solution for the plot

mcauchoix commented 6 years ago

Dear Florian, I have difficulty to find information on what is uniformity of residuals and then what your uniformity test is actually testing. Would that be like a test of normality or more homoscedasticity of residuals? Do you have a statistical test allowing to test normality of residuals? or it's just visual inspection of QQplots?

Although for parametric testing of overdisperssion, I'm not certain about H0. p>0,05 would mean that there is no overdispression, right?

Many thanks Maxime

Maxime Cauchoix PhD, Station d’écologie experimentale du CNRS à Moulis 07 85 23 51 43

???! (°v°) ? ! ... (O) (°v°) (°v°) (°v°) II \ #############

2018-02-01 8:54 GMT+01:00 Maxime Cauchoix mcauchoixxx@gmail.com:

Thank you Florian. I had a look into smooth.line: The x vector should contain at least four distinct values. ‘Distinct’ here is controlled by tol: values which are regarded as the same are replaced by the first of their values and the corresponding y and w are pooled accordingly.

defaults to 1e-4 (formerly 1e-3).

Maybe it should be adapted to x range?

Thanks again!

Maxime Cauchoix PhD, Station d’écologie experimentale du CNRS à Moulis 07 85 23 51 43

???! (°v°) ? ! ... (O) (°v°) (°v°) (°v°) II \ #############

2018-01-31 21:54 GMT+01:00 Florian Hartig notifications@github.com:

Hi Maxime - OK, I have looked at this. As you see above, in principle, you have enough variation in the response. So, a naive residual plot

res = simulateResiduals(Rn_Mod) plotResiduals(predict(Rn_Mod), res$scaledResiduals)

would work. However, if you try this, you will see that this plot shows a pattern. This is not because of a problem in your model, but because your variation in x stems mostly from the random effect in the model, which you could view as a kind of residual as well, so you are plotting res against residual. Therefore, the DHARMa default plot plots only the fixed effect predictions on x. You can emulate this via

plotResiduals(predict(Rn_Mod, re.form = ~0), res$scaledResiduals)

Now, we have only two values for the predictions, and this is what seems to have caused your problems. Curiously, for me the fitted splines / quantile regressions never produced an error, so I don't know if this is a problem specific to your R / package version - I'll have to write a test to see if this is somehow platform-dependent.

Anyway, this is the reason for the problems. The quick fix for you would be to do the plots by hand, and convert the predictions as fact, then DHARMa will produce boxplots instead of the scatter plots, so you can do

plotResiduals(as.factor(predict(Rn_Mod, re.form = ~0)), res$scaledResiduals)

and the qq plot via

gap::qqunif(res$scaledResiduals,pch=2,bty="n", logscale = F, col = "black", cex = 0.6, main = "QQ plot residuals", cex.main = 1)

I will try to implement some kind of fix in the next package version, I'm just not sure yet if I should catch the errors or rather switch the plots if there are only a few unique values for pred.

Will leave this ticket open until this is done

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/florianhartig/DHARMa/issues/42#issuecomment-362067188, or mute the thread https://github.com/notifications/unsubscribe-auth/AAsYfF-k8HD7G9Ic9BuBRqHthym2-gBgks5tQNMDgaJpZM4Rz2lv .

florianhartig commented 6 years ago

test uniformity is doing a KS test for uniformity, see help. You can think of this as the equivalent of a shapiro-test in a linear regression, where test residuals for normality. But in DHARMa, we expect residuals to be uniform (see vignette for explanations), therefore we test uniform.

I have no formal test for heteroskedasticity yet, but you should of course look out for it in the res vs. predicted and res vs. variable plots.

overdispersion = basically yes, p>0,05 = no overdispersion, althought strictly speaking, has for any null hypothesis, it just means you can't show that H0 = no overdispersion is wrong , doesn't mean that H0 is right

mcauchoix commented 6 years ago

Excellent! thank you so much.

Maxime Cauchoix PhD, Station d’écologie experimentale du CNRS à Moulis 07 85 23 51 43

???! (°v°) ? ! ... (O) (°v°) (°v°) (°v°) II \ #############

2018-02-13 15:07 GMT+01:00 Florian Hartig notifications@github.com:

test uniformity is doing a KS test for uniformity, see help. You can think of this as the equivalent of a shapiro-test in a linear regression, where test residuals for normality. But in DHARMa, we expect residuals to be uniform (see vignette for explanations), therefore we test uniform.

I have no formal test for heteroskedasticity yet, but you should of course look out for it in the res vs. predicted and res vs. variable plots.

overdispersion = basically yes, p>0,05 = no overdispersion, althought strictly speaking, has for any null hypothesis, it just means you can't show that H0 = no overdispersion is wrong , doesn't mean that H0 is right

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/florianhartig/DHARMa/issues/42#issuecomment-365276543, or mute the thread https://github.com/notifications/unsubscribe-auth/AAsYfJpUhkrFV0asRUInqquAoxHsWlaCks5tUZcwgaJpZM4Rz2lv .

florianhartig commented 6 years ago

OK, I think this is now working, will be included in the 0.1.6 release

florianhartig / DHARMa

Error in plotSimulatedResiduals for very few unique predictions #42