facebookresearch / balance

The balance python package offers a simple workflow and methods for dealing with biased data samples when looking to infer from them to some target population of interest.
https://import-balance.org
GNU General Public License v2.0
686 stars 41 forks source link

[BUG] libgfortran.so.3: cannot open shared object file when running sample_with_target.adjust(max_de=None) #33

Closed geniusjenny closed 1 year ago

geniusjenny commented 1 year ago

Describe the bug

I got OSError: libgfortran.so.3: cannot open shared object file: No such file or directory when I ran sample_with_target.adjust(max_de=None)

Session information

Please run paste here the output of running the following in your notebook/terminal: Already satisfied all the requirement in the overview pages and installed glmnet_python and balance using the sample code

OSError                                   Traceback (most recent call last)
OSError: libgfortran.so.3: cannot open shared object file: No such file or directory
# Sessions info python 3.8.16

Screenshots

image

Reproducible example

Please provide us with (any that apply):

  1. Code: I ran this code in the tutorial: Using ipw to fit survey weights adjusted = sample_with_target.adjust(max_de=None)
  2. Reference: https://import-balance.org/docs/docs/overview/#code-example-of-using-balance

Additional context

Add any other context about the problem here that might help us solve it.

talgalili commented 1 year ago

Hey @geniusjenny Thanks for the bug report! Could you please:

  1. Paste the output of running session_info: import session_info session_info.show(html=False)

  2. Try running the adjust using: .adjust(method="cbps") And let us know if this works for you?

Thanks!

geniusjenny commented 1 year ago

Thank you so much for your reply!

  1. session_info.show(html=False)
    -----
    balance             0.7.0
    pandas              1.4.3
    session_info        1.0.0
    -----
    IPython             7.16.3
    jupyter_client      6.1.5
    jupyter_core        4.9.2
    -----
    Python 3.8.16 | packaged by conda-forge | (default, Feb  1 2023, 16:01:55) [GCC 11.3.0]
    Linux-4.14.309-231.529.amzn2.x86_64-x86_64-with-glibc2.10
    -----
    Session information updated at 2023-04-13 20:34
  2. .adjust(method="cbps") works! What is the difference with method="cbps" and .adjust(max_de=None)?
talgalili commented 1 year ago

Hey @geniusjenny I'm glad cbps worked!

Regarding cbps

As for the differences: The default method is ipw (using glm with lasso). You could read about it here: https://import-balance.org/docs/docs/statistical_methods/ipw/ You can read about cbps here: https://import-balance.org/docs/docs/statistical_methods/cbps/ You're also welcome to go over the tutorials for more examples and details of usage: https://import-balance.org/docs/tutorials/

Regarding ipw bug

As for the bug with ipw.

It looks like you're missing the libgfortran.so.3 shared library, which is required by the glmnet library. You can install it using the package manager for your Linux distribution. Since you're probably using Amazon Linux 2, you can use the yum package manager to install the required library.

First, open a terminal and update your package manager repositories:

sudo yum update

Then, install the libgfortran package:

sudo yum install libgfortran

After the installation is complete, try running your Python code again. The error should be resolved.

If you still face any issues, you might need to create a symlink for the required libgfortran.so.3 file.

Could you please test the above and see if it solves it for you?

talgalili commented 1 year ago

Hey @geniusjenny

Did my last comment helped resolve the issue for you?

I'm closing the issue in the meantime. If it didn't help, please feel free to reopen it with more details on the current status.

Thanks!

geniusjenny commented 1 year ago

Hi @talgalili,

Thank you so much for your help! I have figured out that the problem probably is -- I am not able to install os packages on the instance I am connecting to, even with the commands. So I switched to another platform that serve Jupyter Notebook and the sample code runs smoothly with sample_with_target.adjust(max_de=None)

One follow up problem I am still facing is that when I switched to my own dataset, sample 578k and population 6 million. the function has several warnings For .adjust(method="cbps"):

WARNING (2023-04-17 18:39:29,331) [cbps/cbps (line 579)]: Convergence of bal_loss function has failed due to 'Maximum number of function evaluations has been exceeded.'
INFO (2023-04-17 18:39:29,332) [cbps/cbps (line 597)]: Running GMM optimization
WARNING (2023-04-17 19:01:00,310) [cbps/cbps (line 612)]: Convergence of gmm_loss function with gmm_init start point has failed due to 'Maximum number of function evaluations has been exceeded.'
WARNING (2023-04-17 19:22:22,563) [cbps/cbps (line 630)]: Convergence of gmm_loss function with beta_balance start point has failed due to 'Maximum number of function evaluations has been exceeded.'
INFO (2023-04-17 19:22:23,116) [cbps/cbps (line 726)]: Done cbps function

Is this something I should worry about? Thank you so much!

talgalili commented 1 year ago

Hey @geniusjenny Sorry for the late reply.

I think it means that the function wasn't able to fully correct the bias it has seen. I suggest you run on the adjusted object (say it's called x): x.covars().plot() And take a look at how much adjustment you got and if there are features with some big problem.

Good luck :)