datalorax / equatiomatic

Convert models to LaTeX equations
https://datalorax.github.io/equatiomatic/
Creative Commons Attribution 4.0 International
615 stars 43 forks source link

Would it make sense to work directly with Tidymodel objects of class `model_fit` #236

Open davidkane9 opened 3 weeks ago

davidkane9 commented 3 weeks ago

Consider:

library(tidymodels)
library(equatiomatic)
#> 
#> Attaching package: 'equatiomatic'
#> The following object is masked from 'package:modeldata':
#> 
#>     penguins

mod <- linear_reg() |> 
  fit(mpg ~ hp + drat, data = mtcars)

extract_eq(mod$fit) # works

$$ \operatorname{mpg} = \alpha + \beta{1}(\operatorname{hp}) + \beta{2}(\operatorname{drat}) + \epsilon $$


extract_eq(mod) # fails
#> Error in UseMethod("extract_lhs", model): no applicable method for 'extract_lhs' applied to an object of class "c('_lm', 'model_fit')"

Created on 2024-08-22 with reprex v2.1.0

There is no bug here. But things would be easier for my students if extract_eq() were to accept mod, or any object resulting from fitting a tidymodel. This seems like it should be an easy fix: just check the class at the start, and pull out the fitted object if it is class model_fit.

I can submit a PR if this seems sensible.

phgrosjean commented 3 weeks ago

Yes, it make sense and you can propose a pull request... but you must take care to avoid increasing the list of (indirect) package dependencies. We try to keep this list as low as possible for obvious reasons. I am not in favor of even a suggests to {tidymodels} in DESCRIPTION, due to the very large number of indirect hard dependencies (95 packages for {tidymodels}), even if some are already dependencies of {broom} obviously.

nrow(pak::pkg_deps("tidymodels"))
## [1] 95                                                                      
nrow(pak::pkg_deps("broom"))
## [1] 23

So you should basically provide a method for the generic extract_eq() for class model_fit, which is easy to do, but without referencing {tidymodels}, even in the examples ! Not easy...

davidkane9 commented 3 weeks ago

Thanks for the feedback! My testing suggests that all that is needed is to add these lines at the top of extract_eq.default.

  if (inherits(model, c("model_fit", "workflow"))) {
    if ("fit" %in% names(model)) {
      model <- model$fit
    }
  }

At the least, this solves my problem, keeps all the current tests passing, and does not require any new packages. Would something like this be OK for the PR, or do we really need a new method?

phgrosjean commented 3 weeks ago

No, not really. You have to write a proper method model_fit for extract_eq().

davidkane9 commented 3 weeks ago

237 submitted. Thanks for the guidance!

datalorax commented 3 weeks ago

Yes, it make sense and you can propose a pull request... but you must take care to avoid increasing the list of (indirect) package dependencies. We try to keep this list as low as possible for obvious reasons. I am not in favor of even a suggests to {tidymodels} in DESCRIPTION, due to the very large number of indirect hard dependencies (95 packages for {tidymodels}), even if some are already dependencies of {broom} obviously.

nrow(pak::pkg_deps("tidymodels"))
## [1] 95                                                                      
nrow(pak::pkg_deps("broom"))
## [1] 23

So you should basically provide a method for the generic extract_eq() for class model_fit, which is easy to do, but without referencing {tidymodels}, even in the examples ! Not easy...

Related.... #152. Not sure if you want to take that on at any point @phgrosjean. I haven't had too many problems with the dependencies from broom or broom.mixed, but that could help.

phgrosjean commented 3 weeks ago

@datalorax Yes, but #152 needs a lot more work and testing that I can do for now. This one is easier.

datalorax commented 3 weeks ago

Yes, 100% agree.

phgrosjean commented 3 weeks ago

I have made a workflow method too. I did a couple of tests. When {broom.mixed} is required, there is an error if it is not loaded:

> library(parsnip)
> library(equatiomatic)
> mod <- linear_reg() |>
+   set_engine("stan") |>
+   fit(mpg ~ hp + drat, data = mtcars)
> extract_eq(mod) # Error
Error in warn_on_stanreg(x) : 
  The supplied model object seems to be outputted from the rstanarm package. Tidiers for mixed model output now live in the broom.mixed package.
> library(broom.mixed) # Required for the model used
> extract_eq(mod) # Works
$$
E( \operatorname{mpg} ) = \alpha + \beta_{1}(\operatorname{hp}) + \beta_{2}(\operatorname{drat})
$$

The example on the {workflows} main page does not work with the stan engine, but it works with base lm() model when there is a preprocessing step (here using step_ns()):

> library(parsnip)
> library(workflows)
> library(broom.mixed) # Required for the model used, or extract_eq() will choke!
> library(equatiomatic)
> # This is the example in {workflows} main page:
> spline_cars <- recipe(mpg ~ ., data = mtcars) |>
+   step_ns(disp, deg_free = 10)
> bayes_lm <- linear_reg() |>
+   set_engine("stan")
> # To use these, you would generally run:
> spline_cars_prepped <- prep(spline_cars, mtcars)
> bayes_lm_fit <- fit(bayes_lm, mpg ~ ., data = juice(spline_cars_prepped))
> # This one does not works with {equatiomatic}:
> extract_eq(bayes_lm_fit)
Error in terms.formula(formula(model)) : 
  '.' in formula and no 'data' argument

Now, if I use lm() instead, it works:

> linear_lm <- linear_reg()
> spline_cars_prepped <- prep(spline_cars, mtcars)
> linear_lm_fit <- fit(linear_lm, mpg ~ ., data = juice(spline_cars_prepped))
> # This one works with {equatiomatic}:
> extract_eq(linear_lm_fit)
$$
\operatorname{mpg} = \alpha + \beta_{1}(\operatorname{cyl}) + \beta_{2}(\operatorname{hp}) + \beta_{3}(\operatorname{drat}) + \beta_{4}(\operatorname{wt}) + \beta_{5}(\operatorname{qsec}) + \beta_{6}(\operatorname{vs}) + \beta_{7}(\operatorname{am}) + \beta_{8}(\operatorname{gear}) + \beta_{9}(\operatorname{carb}) + \beta_{10}(\operatorname{disp\_ns\_01}) + \beta_{11}(\operatorname{disp\_ns\_02}) + \beta_{12}(\operatorname{disp\_ns\_03}) + \beta_{13}(\operatorname{disp\_ns\_04}) + \beta_{14}(\operatorname{disp\_ns\_05}) + \beta_{15}(\operatorname{disp\_ns\_06}) + \beta_{16}(\operatorname{disp\_ns\_07}) + \beta_{17}(\operatorname{disp\_ns\_08}) + \beta_{18}(\operatorname{disp\_ns\_09}) + \beta_{19}(\operatorname{disp\_ns\_10}) + \epsilon
$$

Now, the workflow version also works:

> car_wflow <- workflow() |>
+     add_recipe(spline_cars) |>
+     add_model(linear_lm)
> # Now you can prepare the recipe and estimate the model via a single call to fit():
> car_wflow_fit <- fit(car_wflow, data = mtcars)
> # and extract_eq() works too:
> extract_eq(car_wflow_fit)
$$
\operatorname{..y} = \alpha + \beta_{1}(\operatorname{cyl}) + \beta_{2}(\operatorname{hp}) + \beta_{3}(\operatorname{drat}) + \beta_{4}(\operatorname{wt}) + \beta_{5}(\operatorname{qsec}) + \beta_{6}(\operatorname{vs}) + \beta_{7}(\operatorname{am}) + \beta_{8}(\operatorname{gear}) + \beta_{9}(\operatorname{carb}) + \beta_{10}(\operatorname{disp\_ns\_01}) + \beta_{11}(\operatorname{disp\_ns\_02}) + \beta_{12}(\operatorname{disp\_ns\_03}) + \beta_{13}(\operatorname{disp\_ns\_04}) + \beta_{14}(\operatorname{disp\_ns\_05}) + \beta_{15}(\operatorname{disp\_ns\_06}) + \beta_{16}(\operatorname{disp\_ns\_07}) + \beta_{17}(\operatorname{disp\_ns\_08}) + \beta_{18}(\operatorname{disp\_ns\_09}) + \beta_{19}(\operatorname{disp\_ns\_10}) + \epsilon
$$

I am pretty sure there are many other {tidymodels} model_fit, workflow objects, or other that do not work properly with {equatiomatic}... you mileage will vary!

phgrosjean commented 3 weeks ago

@datalorax @davidkane9 I am waiting for your possible comments or suggestions before I submit the new version of {equationmatic} to CRAN.

datalorax commented 3 weeks ago

Thanks @phgrosjean, I am confused why explicit calls to broom.mixed is needed when it's already an import. Is there a way to avoid that?

It does not surprise me that many of the models won't work. We could go about trying to make most of them work but that also feels like a significant undertaking. So basically I'm fine with whatever you want to do.

phgrosjean commented 3 weeks ago

Yes, it is not clear why library(broom.mixed) solved the error in the example. I don't know either.