apache / datafusion-python

Apache DataFusion Python Bindings
https://datafusion.apache.org/python
Apache License 2.0
321 stars 63 forks source link

Versions >32.0.0 on PyPI have broken substrait support #646

Closed ingomueller-net closed 1 month ago

ingomueller-net commented 3 months ago

Describe the bug

The substrait subpackage cannot be imported from any version >32.0.0 on PyPI.

To Reproduce

pip install datafusion==33.0.0 && python -c "from datafusion import substrait as ss"

produces this error

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "$USER/.venv/mlirdev/lib/python3.11/site-packages/datafusion/substrait.py", line 19, in <module>
    from ._internal import substrait
ImportError: cannot import name 'substrait' from 'datafusion._internal' ($USER/.venv/mlirdev/lib/python3.11/site-packages/datafusion/_internal.abi3.so)

The same is true for versions 34 and 35 but 32.0.0 works.

Expected behavior

Import works and package is usable.

Additional context

The exact wheel that is download in above command is datafusion-33.0.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

l1t1 commented 3 months ago

35 win64 is ok

>>> from datafusion import substrait as ss

>>> dir(ss)
['__builtins__', '__cached__', '__doc__', '__file__', '__getattr__', '__loader__', '__name__', '__package__', '__spec__', 'substrait']
>>> import datafusion
>>> datafusion.__version__
'35.0.0'
EpsilonPrime commented 2 months ago

I'm also seeing this behavior. The possibilities I can come up with is not being included as a feature in recent builds (which seems unlikely since it sometimes works) and missing dependencies only needed at runtime.

EpsilonPrime commented 2 months ago

I had one working environment so I started removing packages. When I downgraded libabseil I started getting this result. I don't think that's the culprit but it did end up downgrading pyarrow from 15 to 9. Perhaps there's a minimum pyarrow version (or other dependency) required.

EpsilonPrime commented 2 months ago

I tried including every package that was installed in conda/pip from my working environment on the failing CI one but that wasn't enough to get things working. I did look at the shared library and found it was looking for a brotli decoder so I even made sure that libbrotlidec was included.

alamb commented 2 months ago

Note I transferred this issue into the datafusion-python repo as I think we should begin triage there

mbwhite commented 1 month ago

FYI _ tried the v38.0.0. from pypi-test and problem remains. Rebuilding the code locally and using the wheel created then works fine maturin build --features substrait