colesbury / nogil

Multithreaded Python without the GIL
Other
2.91k stars 107 forks source link

Problems using Pandas and Numpy #125

Closed marchostau closed 1 year ago

marchostau commented 1 year ago

Hello, I'm trying to use your Python version without the Global Interpreter Lock (GIL) to implement a MapReduce for processing data, taking advantage of multithreading. For this I'm using the pandas and numpy libraries, which don't work using this Python Interpreter. I've installed them with pip but this python version can't import them. The error that appears me is this:

import pandas as pd File "/home/x/.local/lib/python3.9/site-packages/pandas/init.py", line 16, in raise ImportError( ImportError: Unable to import required dependencies: numpy:IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!Importing the numpy C-extensions failed. This error can happen for many reasons, often due to issues with your setup or how NumPy was installed.We have compiled some common reasons and troubleshooting tips at: https://numpy.org/devdocs/user/troubleshooting-importerror.htmlPlease note and check the following: * The Python version is: Python3.9 from "/home/marchostau/.pyenv/versions/nogil-3.9.10-1/bin/python"

  • The NumPy version is: "1.24.1"and make sure that they are the versions you expect. Please carefully study the documentation linked above for further help.Original error was: No module named 'numpy.core._multiarray_umath'

I've also tried to install your numpy version (https://github.com/colesbury/numpy/commits/v1.21.0-nogil) but I can't do it (I'm using Ubuntu 20.04) because of this error:

Looking in indexes: https://d1yxz45j0ypngg.cloudfront.net/, https://pypi.org/simple Collecting git+https://github.com/colesbury/numpy.git Cloning https://github.com/colesbury/numpy.git to /tmp/pip-req-build-ga_vfi0v Running command git clone --filter=blob:none --quiet https://github.com/colesbury/numpy.git /tmp/pip-req-build-ga_vfi0v Resolved https://github.com/colesbury/numpy.git to commit 7cbd822ffcc12808a35757da1289c7a7ee2284d8 Running command git submodule update --init --recursive -q Installing build dependencies ... done Getting requirements to build wheel ... error error: subprocess-exited-with-error × Getting requirements to build wheel did not run successfully. │ exit code: 1 ╰─> [14 lines of output] error: Multiple top-level packages discovered in a flat-layout: ['numpy', 'branding']. To avoid accidental inclusion of unwanted files or directories, setuptools will not proceed with this build. If you are trying to create a single distribution with multiple packages on purpose, you should not rely on automatic discovery. Instead, consider the following options:

  1. set up custom discovery (find directive with include or exclude)
  2. use a src-layout
  3. explicitly set py_modules or packages with a list of names To find more information, look for "package discovery" on setuptools docs. [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. error: subprocess-exited-with-error × Getting requirements to build wheel did not run successfully. │ exit code: 1 ╰─> See above for output. note: This error originates from a subprocess, and is likely not a problem with pip.
colesbury commented 1 year ago

You should be able to just run pip install pandas. That will get you pandas 1.5.2 and numpy 1.24.0 on Linux.

You should not need to build them from source.

marchostau commented 1 year ago

I did it and I could install the pandas library. The problem is when I try to execute code that imports pandas library with the Python interpreter without GIL. The error that I get is this: import pandas as pd

File "/home/x/.local/lib/python3.9/site-packages/pandas/init.py", line 16, in raise ImportError( ImportError: Unable to import required dependencies: numpy:IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!Importing the numpy C-extensions failed. This error can happen for many reasons, often due to issues with your setup or how NumPy was installed.We have compiled some common reasons and troubleshooting tips at: https://numpy.org/devdocs/user/troubleshooting-importerror.htmlPlease note and check the following: * The Python version is: Python3.9 from "/home/marchostau/.pyenv/versions/nogil-3.9.10-1/bin/python"

The NumPy version is: "1.24.1"and make sure that they are the versions you expect. Please carefully study the documentation linked above for further help.Original error was: No module named 'numpy.core._multiarray_umath'

colesbury commented 1 year ago

It looks like you might be mixing packages across Python installations.

This works for me (as does pyenv):

docker run -it nogil/python /bin/bash
pip install pandas
python -c "import pandas as pd; import numpy as np; print(pd.Series([1, 3, 5, np.nan, 6, 8]))"
marchostau commented 1 year ago

Okey I tried this and now is working. thanks you! I've a doubt, are libraries that are not available using this python interpreter? or you can use all of them like if you were using python3.9 with GIL?

colesbury commented 1 year ago

I've a doubt, are libraries that are not available using this python interpreter? or you can use all of them like if you were using python3.9 with GIL?

I'm not sure I fully understand the question. Libraries that use the C-API (like pandas and numpy) need to be compiled specifically for this interpreter. I've pre-built a number of them (see this list) and they are available when you run pip install. Some other simple packages may be automatically built automatically by pip.

I think the problem here is if you run pip install --user <package>. That sticks the package in a directory (~/.local/lib) shared between python3.9 and the nogil interpreter, which seems to cause problems.

marchostau commented 1 year ago

For example, if I try to install pybind I get the version 2.6.2, I can't get the versions pybind11>=2.10.4 that I need.

colesbury commented 1 year ago

Pybind requires patches to work with nogil. I've put up a pre-built version of pybind 2.6.2, but not newer versions. I can apply the patches to a newer release. What do you need pybind11>=2.10.4 for? I'd like to test to make sure things work, if possible.

marchostau commented 1 year ago

I'm trying to install Lithops and one of the library requirements is contourpy. This library needs a version of pybind11>=2.10.4. I'm using Lithops to put and save partitions that are processed by mappers in a MapReduce model. I implemented this MapReduce model using multiprocessing (every process executes a mapper and a reducer and the partitions are exchanged through shared memory with pyarrow serialization (allows me zero-copy reading)) and I want to compare it with an implementation with multithreading (using your python interpreter that fits perfectly with my project (python GIL would not give me better results)).

The error that I get is this:

~$ pip install contourpy Looking in indexes: https://d1yxz45j0ypngg.cloudfront.net/, https://pypi.org/simple Collecting contourpy Using cached contourpy-1.1.1.tar.gz (13.4 MB) Installing build dependencies ... error error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully. │ exit code: 1 ╰─> [7 lines of output] Looking in indexes: https://d1yxz45j0ypngg.cloudfront.net/, https://d1yxz45j0ypngg.cloudfront.net/, https://pypi.org/simple Collecting meson>=1.2.0 Using cached meson-1.2.1-py3-none-any.whl (962 kB) Collecting meson-python>=0.13.1 Using cached meson_python-0.14.0-py3-none-any.whl (76 kB) ERROR: Could not find a version that satisfies the requirement pybind11>=2.10.4 (from versions: 2.6.2) ERROR: No matching distribution found for pybind11>=2.10.4 [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully. │ exit code: 1 ╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

I want to tell you that you did a great job. If It works correctly can give me a lot of advantages at execution and memory use level.

marchostau commented 1 year ago

Today I researched which libraries that are needed by Lithops can be a problem with this python interpreter. The libraries (requirements of Lithops) that I tried to install and I had problems are the following:

contourpy>=1.0.1 -> meson -> meson-python -> Couldn't find a version that satisfies the requirement pybind >= 2.10.4 (from versions: 2.6.2) bcrypt>=3.2 -> This package requires Rust >= 1.56.0pip -> meson -> meson-python -> Couldn't find a version that satisfies the requirement pybind >= 2.10.4 (from versions: 2.6.2) pywin32 -> Couldn't find a version that satifies the requirement pywin32 (from versions:none).

Two of them, contourpy and bcrypt, will can be installed with the updated version of pybind.

colesbury commented 1 year ago

pywin32 -> Couldn't find a version that satifies the requirement pywin32 (from versions:none).

@marchostau which operating system are you using?

marchostau commented 1 year ago

I'm using ubuntu20.04 with a VirtualBox.

colesbury commented 1 year ago

@marchostau I've put up a few pre-built wheels:

You should be able to pip install lithops now (on Linux).

marchostau commented 1 year ago

Perfect! It's working now. Thank you a lot!