fitzLab-AL / gdm

R package for Generalized Dissimilarity Modeling
GNU General Public License v3.0
33 stars 12 forks source link

negative deviance explained value with cross validation #43

Open Basquill opened 1 month ago

Basquill commented 1 month ago

Hello @fitzLab-AL,

Any insight why one might get a negative 'deviance explained' value when running a cross validation? E.g.,

gdm.crossvalidation(gdmTab, train.proportion=0.8, n.crossvalid.tests=10, geo=TRUE, splines=NULL, knots=NULL)

Returns: $Deviance.Explained [1] -24.81551

Deviance explained without cross validation is 36.9. I.e., via

gdm <- gdm(data=gdmTab,geo=TRUE)

This doesn't happen with all my models. When it does, the unvalidated and x-validated values can be quite similar (aside from the latter being negative).

Thanks v much (checked on StackExchange and there isn't anything pertaining to this issue)

fitzLab-AL commented 1 month ago

@Basquill Another user reported this problem recently and I have not had time to look into it.

fitzLab-AL commented 1 month ago

@Basquill As far as I can tell & after testing the functions invovled, it seems this problem can arise during cross validation for a number of reasons, including fitting a model with a large number of predictors and relatively few site pairs and if the relationship between the predictors and the data is very weak. If your data set does not have very many sites, the testing data likely will have too few for proper cross validation, especially when a large number of covariates are used.

I have updated the gdm.crossvalidation function so it now returns zeroes instead of negative deviance explained values - or NAs when the model fails to fit. If the cross validation procedure is returning 0's or NAs for your data, I would suggest increasing the portion of the data used for testing and/or reducing the number of predictors.

The new version is now on GitHub and should be up on CRAN in a few days.

fitzLab-AL commented 1 month ago

@Basquill Actually, there still seems to be an issue with the cross validation function. I will continue to work on it, but the current version on github still needs attention.

Basquill commented 1 month ago

@fitzLab-AL . Thank you.

I ran a few more tests. Here's a brief summary of my set-up.

I have three models. They all show relatively decent deviance explained (30-50%) with a conventional fitting. Site pairs vary from 0.5 to 1.5 million, and I've got 6 or 7 predictors.

I ran x-validation tests with different proportions (training vs testing) and # of iterations (e.g., 10, 100, 100) These tests don't appear to change the x-validation outcomes

I also encountered a new error with one of the models, when I exclude geography. It runs fine when geo=TRUE, but not geo=FALSE

gdm.crossvalidation(gdmTab,train.proportion=0.9, n.crossvalid.tests=10, geo=FALSE, splines=NULL, knots=NULL)

If we interpret negative deviance explained as equivalent to a zero (i.e., nothing explained), why might this result be so different from that obtained with either a conventional gdm() or permutted fitting gdm.varImp()?

Is it possible that gdm models don't cross-validate well? I notice that some (many?) recently published examples do not include this procedure.

Thanks

fitzLab-AL commented 4 days ago

@Basquill The issues with the crossvalidation function should be fixed now - turns out the problems were related to the predict function. Please confirm things work OK for you now.