easystats / insight

:crystal_ball: Easy access to model information for various model objects
https://easystats.github.io/insight/
GNU General Public License v3.0
380 stars 38 forks source link

Detect the package that is needed to create or predict a model object #849

Open tripartio opened 4 months ago

tripartio commented 4 months ago

Hello. First, thanks for your fantastic package that I only recently discovered. It has really simplified some really tricky parts of my ale package.

One thing I would like to be able to do is to detect the package that is needed to create or predict a model object. For example, if I give the {insight} package a gam model object, I would like it to tell me that this object was created using the {mgcv} package. Is this possible with {insight}?

bwiernik commented 4 months ago

If the object is an S3 object, this should work --

getS3method("predict", class(object)[1]) |> rlang::ns_env_name()

If the object is S4 (eg, from lme4) --

attr(findClass('lmerMod')[[1]], "name")
tripartio commented 4 months ago

@bwiernik Thanks for your suggestions. I will try to test them soon when I have a free moment.

Could such functionality be integrated as a function in the package?

bwiernik commented 4 months ago

Can you say more about the use case? If you have the package to fit the model, then the package should certainly be available to supply its methods

tripartio commented 4 months ago

@bwiernik, the context is parallel processing with my ale package. Most of the package functions receive a model as input and then analyze the model for various interpretable machine learning (IML) tasks.

So, for example, when my ale() function runs sequentially (without parallel processing), there is no problem. As you indicated, as long as the package with which the model was created is installed on the system, the ale() function runs fine. But since I added parallel processing, the process has started choking, even though I use furrr, an advanced parallel processing package that takes care of almost everything automagically. The problem is that furrr has to send the necessary package environments to each parallel worker so that each worker can independently run the code. It cannot automatically detect that my code needs further packages, so it chokes. So, my code needs to tell furrr which extra packages are needed for the parallel workers to do their tasks.

The current version of my code requires users to specify a model_packages argument just for the sake of parallel processing. I would like to modify my code so that it automatically detects the model's package so that users would not need to supply this argument. This is what I would like insight to do for me.

So, I think that my use case could be generalized to parallel processing when the source package of certain complex objects (in my case, models) needs to be detected.

tripartio commented 4 months ago

If the object is an S3 object, this should work --

getS3method("predict", class(object)[1]) |> rlang::ns_env_name()

@bwiernik Thanks; I have now implemented this check for S3 objects in my package and it now automatically detects and loads the appropriate packages.

I don't understand the S4 check code you gave (probably because I rarely work with S4 objects), so my package is now configured to automatically check for the S3 package and then give a graceful error message if it cannot be detected. Then users can explicitly specify the packages with my existing manual mechanism. That is acceptable, since it should work automatically for most users and only require manual intervention for a few complicated cases.

Would it be feasible to incorporate such a check into the {insight} package, extended with checks for S4 objects as well?

Regardless, I appreciate your help. Your little S3 code has let me simplify my function usage for most users.

bwiernik commented 4 months ago

R has two widely used class systems—S3 and S4 (and several less-used ones). S3 is most widely used, but some major modeling packages do use S4 (lme4 and OpenMx are the first ones that come to mind).

The function isS4() can detect if an object is S4 or not. The code above will return the namespace associated with the class of the S4 model.

tripartio commented 4 months ago

If the object is S4 (eg, from lme4) --

attr(findClass('lmerMod')[[1]], "name")

But you specified the name of the S4 class in the code. I am not sufficiently familiar with S4 to convert this to code where I have an object of undetermined type and then probe its namespace (as with your S3 snippet above).

bwiernik commented 4 months ago

Oh sorry

attr(findClass(object)[[1]], "name")