Closed samuelcolvin closed 1 month ago
Note for reproducing: make sure to remove the .*so
file that maturin develop
places next to ./datafusion/__init__.py
@samuelcolvin, I assume this is an instance of this issue that the maturin docs warns about:
This structure is recommended to avoid a common ImportError pitfall
The docs recommend nesting the python source as python/datafusion/
, and the only references I find about lib.name
recommend keeping it the same.
To create a mixed Rust/Python project, add a directory with your package name (i.e. matching lib.name in your Cargo.toml) to contain the Python source:
Most importantly, my first attempt at using your fix did not appear to fix it (see below)
Do you have any thoughts on this solution vs the recommended one? (I'm still learning about maturin).
❯ maturin build
⚠ Warning: You specified maturin >=0.15, <0.16 in pyproject.toml under `build-system.requires`, but the current maturin version is 1.4.0
🍹 Building a mixed python/rust project
🔗 Found pyo3 bindings with abi3 support for Python ≥ 3.8
🐍 Not using a specific python interpreter
📡 Using build options features, locked from pyproject.toml
Compiling pyo3-build-config v0.20.2
Compiling pyo3-ffi v0.20.2
Compiling pyo3 v0.20.2
Compiling datafusion-python v37.1.0 (/home/mike/workspace/datafusion-python/flake)
Compiling arrow v51.0.0
Compiling datafusion-common v37.1.0
Compiling datafusion-expr v37.1.0
Compiling datafusion-execution v37.1.0
Compiling datafusion-sql v37.1.0
Compiling datafusion-physical-expr v37.1.0
Compiling datafusion-functions v37.1.0
Compiling datafusion-optimizer v37.1.0
Compiling datafusion-physical-plan v37.1.0
Compiling datafusion-functions-array v37.1.0
Compiling datafusion v37.1.0
Compiling datafusion-substrait v37.1.0
Finished dev [unoptimized + debuginfo] target(s) in 1m 02s
⚠ Warning: No compatible platform tag found, using the linux tag instead. You won't be able to upload those wheels to PyPI.
📦 Built wheel for abi3 Python ≥ 3.8 to /home/mike/workspace/datafusion-python/flake/target/wheels/datafusion-37.1.0-cp38-abi3-linux_x86_64.whl
flake on flake [$!?⇕] is 📦 v37.1.0 via 🐍 v3.11.9 (venv) via 🦀 v1.77.1 via ❄ impure (nix-shell-env) on ☁ (us-east-1) took 1m27s
❯ pip install .
Processing /home/mike/workspace/datafusion-python/flake
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: pyarrow>=11.0.0 in /nix/store/llp7z339ix0lm7nlrndn1yvdmhskcsyk-rust-toolchain/lib/python3.11/site-packages (from datafusion==37.1.0) (15.0.0)
Requirement already satisfied: numpy<2,>=1.16.6 in /nix/store/llp7z339ix0lm7nlrndn1yvdmhskcsyk-rust-toolchain/lib/python3.11/site-packages (from pyarrow>=11.0.0->datafusion==37.1.0) (1.26.4)
Building wheels for collected packages: datafusion
Building wheel for datafusion (pyproject.toml) ... done
Created wheel for datafusion: filename=datafusion-37.1.0-cp38-abi3-linux_x86_64.whl size=17875320 sha256=b417c0d9b4662a0f687359a2e751580d4a2fc0714a02b447cfe6deff11138413
Stored in directory: /tmp/nix-shell.EXRUlg/pip-ephem-wheel-cache-tbyz75qv/wheels/8e/1b/87/2a5f750e961aa47403189d62f5e1236e0819f25e0a9068babf
Successfully built datafusion
Installing collected packages: datafusion
Successfully installed datafusion-37.1.0
flake on flake [$!?⇕] is 📦 v37.1.0 via 🐍 v3.11.9 (venv) via 🦀 v1.77.1 via ❄ impure (nix-shell-env) on ☁ (us-east-1) took 9m50s
❯ python
Python 3.11.9 (main, Apr 2 2024, 08:25:04) [GCC 13.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import datafusion
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/mike/workspace/datafusion-python/flake/datafusion/__init__.py", line 28, in <module>
from ._internal import (
ModuleNotFoundError: No module named 'datafusion._internal'
w/ a git diff of
❯ git diff
diff --git a/Cargo.toml b/Cargo.toml
index 9da36d7..5bb1f83 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -60,7 +60,7 @@ url = "2.2"
pyo3-build-config = "0.20.0"
[lib]
-name = "datafusion_python"
+name = "_internal"
crate-type = ["cdylib", "rlib"]
[profile.release]
diff --git a/src/lib.rs b/src/lib.rs
index a696ebf..8ce1b16 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -72,6 +72,7 @@ pub(crate) struct TokioRuntime(tokio::runtime::Runtime);
/// The higher-level public API is defined in pure python files under the
/// datafusion directory.
#[pymodule]
+#[pyo3(name="_internal")]
fn _internal(py: Python, m: &PyModule) -> PyResult<()> {
// Register the Tokio Runtime as a module attribute so we can reuse it
m.add(
Following the recommended solution of nesting python/datafusion
and adding tool.maturin.python-source = "python"
does seem to work for me.
❯ git diff pyproject.toml
diff --git a/pyproject.toml b/pyproject.toml
index d353605..24bd296 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -55,6 +55,7 @@ repository = "https://github.com/apache/arrow-datafusion-python"
profile = "black"
[tool.maturin]
+python-source = "python"
module-name = "datafusion._internal"
include = [
{ path = "Cargo.lock", format = "sdist" }
flake on flake [$✘!?⇕] is 📦 v37.1.0 via 🐍 v3.11.9 (venv) via 🦀 v1.77.1 via ❄ impure (nix-shell-env) on ☁ (us-east-1)
❯ ls -l python
total 4
drwxr-xr-x 5 mike users 4096 May 14 08:07 datafusion
flake on flake [$✘!?⇕] is 📦 v37.1.0 via 🐍 v3.11.9 (venv) via 🦀 v1.77.1 via ❄ impure (nix-shell-env) on ☁ (us-east-1)
❯ maturin build
⚠ Warning: You specified maturin >=0.15, <0.16 in pyproject.toml under `build-system.requires`, but the current maturin version is 1.4.0
🍹 Building a mixed python/rust project
🔗 Found pyo3 bindings with abi3 support for Python ≥ 3.8
🐍 Not using a specific python interpreter
📡 Using build options features, locked from pyproject.toml
Finished dev [unoptimized + debuginfo] target(s) in 0.17s
⚠ Warning: No compatible platform tag found, using the linux tag instead. You won't be able to upload those wheels to PyPI.
📦 Built wheel for abi3 Python ≥ 3.8 to /home/mike/workspace/datafusion-python/flake/target/wheels/datafusion-37.1.0-cp38-abi3-linux_x86_64.whl
flake on flake [$✘!?⇕] is 📦 v37.1.0 via 🐍 v3.11.9 (venv) via 🦀 v1.77.1 via ❄ impure (nix-shell-env) on ☁ (us-east-1) took 20s
❯ pip install .
Processing /home/mike/workspace/datafusion-python/flake
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: pyarrow>=11.0.0 in /nix/store/llp7z339ix0lm7nlrndn1yvdmhskcsyk-rust-toolchain/lib/python3.11/site-packages (from datafusion==37.1.0) (15.0.0)
Requirement already satisfied: numpy<2,>=1.16.6 in /nix/store/llp7z339ix0lm7nlrndn1yvdmhskcsyk-rust-toolchain/lib/python3.11/site-packages (from pyarrow>=11.0.0->datafusion==37.1.0) (1.26.4)
Building wheels for collected packages: datafusion
Building wheel for datafusion (pyproject.toml) ... done
Created wheel for datafusion: filename=datafusion-37.1.0-cp38-abi3-linux_x86_64.whl size=17875125 sha256=921aca268be398c639695a5b851e14b3a19185ce39f2336d736a5f5705841b01
Stored in directory: /tmp/nix-shell.e0IZLi/pip-ephem-wheel-cache-lzlmoc66/wheels/8e/1b/87/2a5f750e961aa47403189d62f5e1236e0819f25e0a9068babf
Successfully built datafusion
Installing collected packages: datafusion
Attempting uninstall: datafusion
Found existing installation: datafusion 37.1.0
Uninstalling datafusion-37.1.0:
Successfully uninstalled datafusion-37.1.0
Successfully installed datafusion-37.1.0
flake on flake [$✘!?⇕] is 📦 v37.1.0 via 🐍 v3.11.9 (venv) via 🦀 v1.77.1 via ❄ impure (nix-shell-env) on ☁ (us-east-1) took 4s
❯ python
Python 3.11.9 (main, Apr 2 2024, 08:25:04) [GCC 13.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import datafusion
>>> datafusion.__version__
'37.1.0'
@davidhewitt, you're the maturin expert - can you tell us what to do.
Yes I suggest moving python sources into a subdirectory as @Michael-J-Ward suggests - the ImportError
pitfall described in the maturin docs is a common one and the subdirectory is my preferred workaround (we did the same in pydantic-core
).
makes sense. I think this started working because I had run maturin develop
before making this change.
In particular the problem is that maturin develop
(or pip install -e .
) will use the checkout's Python sources, but pip install .
will put a copy into site-packages
, which then conflicts with the checkout sources if you're running Python from the repository root. (You'll end up importing from the local sources rather than the installed contents in site-packages
.)
Gracias for the report @samuelcolvin and the knowledge @davidhewitt.
Superseding PR in #695
With the current settings
maturin build
andpip install .
gave me an error when I tried to importdatafusion
: