fitzLab-AL / gdm

R package for Generalized Dissimilarity Modeling
GNU General Public License v3.0
33 stars 12 forks source link

Increasing the number of splines and knots reduces propability of model fitting #33

Closed marcelglueck closed 7 months ago

marcelglueck commented 10 months ago

Hello all, first of all, thank you for providing this powerful piece of software to the scientific community.

I have quite complex models to fit for multiple species and sub-setting approaches. If I use the default settings (# splines 3), at least two models can be fit. However, as increasing the number of splines and knots might result in accommodating more complex models, I hoped I could get more models to fit by increasing these parameters. From the manual, it is not completely clear whether the number of knots is automatically adjusted for the number of splines requested. Hence, I decided to calculate the quantiles for the knots myself using the following approach:

 EXPLO_npred_splines <- ((ncol(EXPLO_spt) - 6) / 2)
      EXPLO_number_splines <- GLOBAL_nsplines
      EXPLO_splines_vector <-
        rep(EXPLO_number_splines, EXPLO_npred_splines)

      # Adjust knot vector
      EXPLO_knots_values <- c()
      for (o in 2:ncol(EXPLO_env_data)) {
        EXPLO_pred_col <- EXPLO_env_data[o]
        EXPLO_quant_vec <-
          c(0:(EXPLO_number_splines - 1)) / (EXPLO_number_splines - 1)
        EXPLO_pred_vals <-
          quantile(EXPLO_pred_col,
                   EXPLO_quant_vec,
                   na.rm = TRUE,
                   names = FALSE)
        # Append the knots values to vector to hold the values for all predictors
        EXPLO_knots_values <-
          c(EXPLO_knots_values, EXPLO_pred_vals)
      }

However, if I pursue this approach, the two models that were fit with the default values fail. I tried different values for the number of splines (4, 5, 10, 50, 100) but still, no model can be fit. I am aware of the fact that the impact of these parameters is not fully understood yet (https://github.com/fitzLab-AL/gdm#gdm-fitting). However, I am really surprised that even changing the number of splines to 4 results in model fit failures and I still don't understand why.

Thank you for your assistance.

fitzLab-AL commented 7 months ago

Here too a fully reproducible example would help us to debug the problem.

marcelglueck commented 7 months ago

Here the requested reprex. If you change EXPLO_number_splines from three to another value, model fitting fails. gdm_reprex.zip

fitzLab-AL commented 7 months ago

I used EXPLO_number_splines=2 and the code worked fine. I also get a fitted model when running the example in the help docs: gdmTabMod <- gdm(sitePairTab, geo=TRUE, splines=rep(4,11))

When I used EXPLO_number_splines=4, the models do not fit. This is not a problem with the code, but rather the model cannot converge using 4 splines and your data. This is not surprising given the size of your dataset and the fact that most of your distance values = 0.