easystats / insight

:crystal_ball: Easy access to model information for various model objects
https://easystats.github.io/insight/
GNU General Public License v3.0
402 stars 39 forks source link

find_ (get_?) algorithm #38

Open DominiqueMakowski opened 5 years ago

DominiqueMakowski commented 5 years ago

Although the fitting algorithm plays an important role, it is often unreported/uncared about. Surprisingly, its access is not really straightforward.

What do you think about a function that does that?

Here's a draft:

#' @export
find_algorithm <- function(model, ...) {
  UseMethod("find_algorithm")
}

#' @export
find_algorithm.merMod <- function(model) {
  if(model@resp$REML == 0){
    algorithm <- "ML"
  } else{
    algorithm <- "REML"
  }

  out <- list(
    "algorithm" = algorithm,
    "optimizer" = as.character(model@optinfo$optimizer)
  )

  return(out)
  }

#' @export
find_algorithm.stanreg <- function(model) {

  info <- model$stanfit@sim

  out <- list(
    "algorithm" = model$algorithm,
    "chains" = info$chains,
    "iterations" = info$iter,
    "warmup" = info$warmup
  )

  return(out)
}
strengejacke commented 5 years ago

Yes, would be a function that fits into insight. I bet it's less straightforward for brms-models... For which models whould this make sense?

DominiqueMakowski commented 5 years ago

Especially bayesian (distinguishing between MCMC fullrank and meanfield), and for frequentist where there are customizable parameters (lme4). For fixed algorithms, we could hard code the used algorithm (for instance, "OLS" for lm).

However, my view is much more narrower than yours regarding the different packages and models, so I am not sure of the other cases of application.

But I still think it's worth it to start with a few supported models, and then eventually expand it depending on time, demand and so on...

strengejacke commented 5 years ago

I had especially mixed models in mind, so functions like glmmTMB, lmer, glmer, lme, mixed_model, glmmPQL?!?

DominiqueMakowski commented 5 years ago

Well for lme4's lmer and glmer from what I understood it's respectively either ML or REML, or ML.

For the others, I don't know...

And there are apparently additional differences: image

strengejacke commented 5 years ago

Especially the optimizers differ, I guess, not much the algorithm.

strengejacke commented 5 years ago

Ok, I implemented a basic draft, but I have the feeling we should ask some mixed-models experts about what might be important to return.

DominiqueMakowski commented 5 years ago

That's super cool, great work! Maybe we could post an issue on lme4 and glmmTMB to ask for confirmation and thoughts?

strengejacke commented 5 years ago

I think we can take the current implementation for now, and then later check https://github.com/easystats/insight/issues/38#issuecomment-472476777 more in detail.

alexpghayes commented 5 years ago

I believe the solution here is to differentiate between estimands, estimators, and estimation algorithms.

beta_mle = argmax (log-likelihood of normal linear model)

Now, the situation with mixed models is somewhat more complex because we start approximating things. We start by picking either a REML or an ML estimator. But calculating these things out exactly isn't feasible or desirable for some reason, so instead we come up with a new estimator that approximates the original estimator. Whether you want to think about these approximations as the same as the original estimator or a new separate thing is sort of hazy. The approximations have different properties than the original estimator, but morally they're trying to be the same thing.

Anyway, if someone told me they fit a mixed model, I would want to know:

I have a paper draft that goes into much more detail that I would be happy to share if you'd like.

DominiqueMakowski commented 5 years ago

@alexpghayes Thanks for the clarification!

From that it seems that our find_algorithm function currently returns the estimator rather than the estimation algorithm. On the concrete side, in regards to insight, I wonder if changes as the following could potentially address this terminological discrepancy:

1) find_algorithm as the master function:

- Renaming the current `find_algorithm` -> `find_estimator`
- Adding `find_estimation` to attempt to retrieve it when possible
- `find_algorithm` would become a "general" function that would return a list containing the estimator and the estimation algorithm.

2) find_estimation as the master function: same as above but find_estimation is the general function and find_algorithm the method specific for estimation algorithm.

However, these are breaking changes, hence must be carefully considered and thoroughly described.

I have a paper draft that goes into much more detail that I would be happy to share if you'd like.

That's great, please do so :)

alexpghayes commented 5 years ago

From that it seems that our find_algorithm function currently returns the estimator rather than the estimation algorithm. On the concrete side, in regards to insight, I wonder if changes as the following could potentially address this terminological discrepancy: ...

I think there are lots of reasonable ways to split the functions, but I think in the end users will want to know both the estimator and the estimation algorithm. I would probably return both of these pieces of information in a list from a function estimation_details() if I were to implement this myself.

For mixed models, from what I understand, it comes down to a design decision from our side as to how (if?) to we want to classify the approximation aspect. In order to maintain some continuity between the models (contributing to the unified vision proposed by the package), I would personally tend to "omit" it, and classify things based on the desired/philosophical/"moral" resemblance. In other words, I would classify the approximated ML estimator for a mixed model as ML, as the fact that it is approximated is implied by the nature of the model itself (the fact that it is a mixed model). Nevertheless, we could also make it explicit by adding a variable in the list returned by the master function, e.g. approximated = TRUE or approximation = "approx-method".

I think going be moral resemblance is very reasonable for mixed models. I like the idea of explicitly telling the user the approximation method as well.

Importantly, we must also account for the case of Bayesian models. @alexpghayes, How in your opinion does the Bayesian sampling "algorithm" fit into this categorization? Is MCMC the estimator? The estimation algorithm? Or a separate category of "sampling algorithm" that does not overlap with the previous ones?

I don't know enough about Bayes to distinguish between estimators and estimation algorithms in the MCMC world. I imagine someone from the Stan crew could clarify pretty quickly, though.

I have a paper draft that goes into much more detail that I would be happy to share if you'd like.

Will you shoot me an email at alexpghayes@gmail.com and I'll send the draft.