bdwilliamson / vimpy

Perform inference on algorithm-agnostic variable importance in Python
https://pypi.org/project/vimpy/
MIT License
20 stars 5 forks source link

Installation of version 2.1 and errors in spvim() #6

Open Tim-Re opened 2 years ago

Tim-Re commented 2 years ago

Hi I'm currently trying to use spvim() from vimpy for its ability to accomodate arbitrary prediction functions as oppose to sp_vim() in R where as far as I can see only learners from the SL library can be used. When trying to install version 2.1 I however encountered the following error:
ERROR: Could not find a version that satisfies the requirement scipy.stats (from vimpy) (from versions: none) ERROR: No matching distribution found for scipy.stats This seems to be due to the 'scipy.stats' in line 20 in the file "setup.py". Maybe it is there for a reason but after removing it the installation worked fine.

Additionally when using the function spvim() I also encountered a few errors. It could also be that I'm using it in a wrong way however a few potential errors (which unfortunately did not entirely resolve the problems) in vimpy/vimpy/spvim.py are:

for method get_influence_function():

After incorporating these changes get_influence_function() worked, however, the methods get_ses() and get_cis() seem to have further issues i.e. problems with the indices in the shapley_se() function etc.

While I'm not sure about all of the above propositions they might still be of some use.

Kind regards.

bdwilliamson commented 2 years ago

Thanks for using the package!

I'm unclear what you mean by "arbitrary prediction functions" -- the Super Learner does allow you to use a wide variety of candidate prediction functions (you can see all of them using SuperLearner::listWrappers()), and you can use single algorithms in vimp::sp_vim by specifying a single character value (e.g., SL.library = "SL.glm"). You can also specify tuning parameters, etc. using the Super Learner; see SuperLearner::createLearner for more details.

Thanks also for looking at the python package. I'm not sure why you're having issues with setup.py (which works fine on my machine; this is really for sending to PyPI, not for installing the package locally), but I'm glad that you found a workaround.

Can you provide a minimum working example so that I can see what's going on in spvim()? I fixed the indexing and zcounts issue (thanks!).

Tim-Re commented 2 years ago

Hi thanks for the answer.

By arbitrary prediction functions, I really just meant that in Python spvim() only seems to ask for any learner with a fit and predict method, while in R one is bound to the SuperLearner framework. However as it turns out I also have underestimated the capabilities of SuperLearner.

Regarding the setup.py issue it might be worth noting that the error message did not just occur on my local computer but also on google colab. However, when installing version 2.0.2.2, which does not include 'scipy_stats' in the install_requires list, the installation worked fine. Pip also seems to check the dependencies according to the provided install_requires list and installs missing packages if necessary. I would then imagine that it tries to install scipy.stats which throws the same error as trying to install vimpy 2.1 and should already be available through scipy. Right now, when not specifing a version, pip automatically installs vimpy 2.0.2.2 instead of 2.1.

In colab: !pip install vimpy==2.1 #error occurs
!pip install git+https://github.com/bdwilliamson/vimpy #error occurs !pip install vimpy==2.0.2.2 #error does not occur

A small example for the issue with spvim() is given below. Attached is also a screenshot of the error message triggered by vimpy_obj.get_ses().

import numpy as np import vimpy import pkg_resources from sklearn.linear_model import LinearRegression pckg = pkg_resources.get_distribution("vimpy") print(pckg.version) #2.1

def lm(n): mean = np.zeros(3) cov = np.eye(3) X = np.random.default_rng().multivariate_normal(mean, cov, n) x1 = X[:,0] x2 = X[:,1] x3 = X[:,2] f = x1 + 2*x2 - x3 y = f + np.random.normal(0,1,n) return y, X

y,x = lm(1000)
model = LinearRegression() vimpy_obj = vimpy.spvim(y = y, x = x, V = 5, pred_func = model, measure_type = "r_squared")

vimpy_obj.get_point_est() vimpy_obj.get_influence_functions() vimpy_obj.get_ses() vimpy_obj.get_cis()

image

bdwilliamson commented 2 years ago

Thanks for the MWE. I've just completed a patch that should solve your issue (on GitHub, not PyPI yet). Please let me know if you're still seeing problems!

Tim-Re commented 2 years ago

Thanks for the ammendments. Unfortunately using the MWE from above (or the example on the vimpy git page) there seems to be another error in get_ses() for var_s = np.nanvar(shapley_ics['contrib_s'][idx, :]) in vimpy/spvim_ic.py line 50.

image

bdwilliamson commented 2 years ago

I'm not getting that error when I use the latest version of the package on GitHub. Can you try updating vimpy using python -m pip install git+https://github.com/bdwilliamson/vimpy.git@aef6b90dbaa77d9a9dce9a45b4786b37a294c36c and re-running the MWE?

Tim-Re commented 2 years ago

I've reinstalled the version of the most recent commit hash on both my local machine as well as on colab. The error unfortunately persists. What seems odd to me is that the .dtype attribute is different between contrib_s and contrib_v (see image below). I've also tried different versions of numpy which did not help.

image

bdwilliamson commented 2 years ago

Ok I've made the dtypes of the two the same (should both be float64). I've also bumped the version number to 2.1.1, so you should be able to confirm that this version is installed. Other than that, I'm not sure how to help, since I'm not seeing any errors when running the MWE on my machine (Python 3.8).

Tim-Re commented 2 years ago

Thanks a lot for the efforts and the quick responses! After the dtype change it now works.

(As a small final and hopefully not annoying sidenote: in getcis() the interval is assigned to self.ci, however, under init it is self.cis_ so the confidence intervals are not returned in the end.)

bdwilliamson commented 2 years ago

Thank you for your help finding these bugs! I really appreciate your patience. Just fixed that last bug as well, I'll try to get a release to PyPI soon.