Open tim-lauer opened 1 week ago
Hi,
your question sound as if you are coming from a ML background. They don't really make sense for a statistical analysis. I'm not sure what you are trying to achieve:
If you are trying to build a predictive model, forget about the residual checks and just do whatever as you would do in a ML study
However, if you are trying to run a statistic analysis (where you want to get correct p-values etc), then the residual checks become important but class-imbalance is not something people usually worry about and you definitely don't want to weigh your data in this kind of analysis. If this is your goal, I would recommend to read a stats book on GLMs to understand assumptions of the logistic regression etc. I'm not sure if it's helpful alone, as they don't cover the theory, but my lecture notes on practical regression analysis are here https://theoreticalecology.github.io/AdvancedRegressionModels/
Thanks! Yes, I have some ML background, but here my goal is to do statistical analysis (good to know that one would refrain from using weights for this).
Thanks a lot for the lecture notes! If I may: would you think that the first (unweighted) model provides more or less correct p-values from the plots? It's hard for me to tell how severe the problems are (while considering that I would not want to add more variables - I have specific hypotheses about the included ones - or remove outliers unless really necessary).
Best, Tim
Hi Tim,
in statistical models, you cannot just re-weigh your data. That doesn't mean your model and p-values are correct. You have to ensure that you have the right model according to other strategies, which is what is discussed in courses about regression modelling.
Best F
Hi,
thanks for the great library. I am a beginner, hope it is ok to ask this question here.
I am working with a binomial GAM with an imbalanced DV (0: 254410, 1: 3999).
I don't have any experience interpreting the DHARMa plots (any help would be appreciated), but I figured that the predictive performance of the model - though not my main focus - was too low to be a good model (Precision: 0.35, Recall: 0.003, F1: 0.005).
I tried to account for the imbalance in the DV by using weights inversely proportional to class frequencies (which BTW only slightly improved predictive performance, Precision: 0.03, Recall: 0.75, F1: 0.07), but now, I cannot create the DHARMa plots anymore. I tried using different weights, but each time I get the error:
Surprisingly, when I tried to downsample the majority class of the DV (-> same count as minory class) as a quick test without weighting, the predictive performance was much better (F1=0.7), and the DHARMa plots worked just fine (and looked better from what I can tell):
So my questions would be:
Any help, even a one-liner, would be greatly appreciated. Thanks.
Best, Tim