apache / datafusion-python

Apache DataFusion Python Bindings
https://datafusion.apache.org/python
Apache License 2.0
321 stars 64 forks source link

chore: update to maturin's recommended project layout for rust/python… #695

Closed Michael-J-Ward closed 1 month ago

Michael-J-Ward commented 1 month ago

Supersedes https://github.com/apache/datafusion-python/pull/694

Rationale for this change

The previous layout leads to an import error when installing with maturin build and pip install ..

This error was common enough that maturin changed the recommended project layout to what this commit does.

A prior PR attempted to solve this by altering lib.name in Cargo.toml, but that did not work for me.


Maintainer of pyo3 explains the issue on the prior PR

In particular the problem is that maturin develop (or pip install -e .) will use the checkout's Python sources, but pip install . will put a copy into site-packages, which then conflicts with the checkout sources if you're running Python from the repository root. (You'll end up importing from the local sources rather than the installed contents in site-packages.)

Michael-J-Ward commented 1 month ago

I'm sure @andygrove would have the same Q as on previous PR.

@davidhewitt, could you spare a little more knowledge about the effects of changing lib.name?

Does this have any impact on downstream maturin projects that use this crate as a dependency? If they also use the name _internal, will there be a conflict?

Should we replace all references to _internal with datafusion_internal instead?

Michael-J-Ward commented 1 month ago

@andygrove , I have another commit queued up if you want to switch the lib.name and python module from _internal to _datafusion_internal

andygrove commented 1 month ago

@andygrove , I have another commit queued up if you want to switch the lib.name and python module from _internal to _datafusion_internal

I think we can undo this last commit and ignore my question/concern. I missed that the maturin project definition was already qualifying _internal as datafusion._internal, so I think there should be no conflict with downstream maturin projects.

Sorry for the noise.

Michael-J-Ward commented 1 month ago

@andygrove, the commit that changes lib.name is not in this PR. It's in a different branch of my fork.

I created my own noise be referencing the suggestion in this PR in that commit message.

davidhewitt commented 1 month ago

Looks like you figured this out already, but yes I agree no need to change lib.name here 👍