apache / iceberg-python

Apache PyIceberg
https://py.iceberg.apache.org/
Apache License 2.0
384 stars 140 forks source link

Fast Avro Decoder not included in Conda Deployment of pyiceberg #1093

Open aschreiber1 opened 3 weeks ago

aschreiber1 commented 3 weeks ago

Feature Request / Improvement

when you install pyiceberg via conda you get warnings like: /home/coder/.conda/envs/coder/lib/python3.10/site-packages/pyiceberg/avro/decoder.py:185: UserWarning: Falling back to pure Python Avro decoder, missing Cython implementation warnings.warn("Falling back to pure Python Avro decoder, missing Cython implementation")

Which means it is missing the fast avro decoder. It would be great to have this functionality to speed up our queries!

kevinjqliu commented 2 weeks ago

Im not sure how condo deals with Cython extensions

but here's the relevant code https://github.com/apache/iceberg-python/blob/e4c1748fee220076f04e35ab2f182dd51ca20705/pyiceberg/avro/decoder.py#L185

https://github.com/apache/iceberg-python/blob/e4c1748fee220076f04e35ab2f182dd51ca20705/build-module.py#L44-L52

JanKrl commented 5 days ago

I experience the same issue with installation via pypi (Windows): D:\code\...\.venv\Lib\site-packages\pyiceberg\avro\decoder.py:185: UserWarning: Falling back to pure Python Avro decoder, missing Cython implementation

https://github.com/apache/iceberg-python/blob/9b9ed534b2022cb9a687f4ed876fadcc2457b31b/pyiceberg/avro/decoder.py#L177-L187

kevinjqliu commented 5 days ago

Adding some context to the Avro decoder build process.

We use Poetry to build the Avro decoder via this script https://github.com/apache/iceberg-python/blob/b5933756b5b488ec51cd56d5984731b6cc347f2b/pyproject.toml#L583

https://github.com/apache/iceberg-python/blob/b5933756b5b488ec51cd56d5984731b6cc347f2b/build-module.py

you can manually trigger by

poetry build

Depending on the platform, there might be some missing piece that's not allowing the build to succeed.

@JanKrl can you try the above command and paste the output here for debugging?

JanKrl commented 4 days ago

Adding some context to the Avro decoder build process.

We use Poetry to build the Avro decoder via this script

https://github.com/apache/iceberg-python/blob/b5933756b5b488ec51cd56d5984731b6cc347f2b/pyproject.toml#L583

https://github.com/apache/iceberg-python/blob/b5933756b5b488ec51cd56d5984731b6cc347f2b/build-module.py

you can manually trigger by

poetry build

Depending on the platform, there might be some missing piece that's not allowing the build to succeed.

@JanKrl can you try the above command and paste the output here for debugging?

Deleting python environment (env) solved the problem for me. So, (un)fortunately I'm not able to reproduce it anymore.

kevinjqliu commented 4 days ago

Great to hear. did you clean the env manually or use make clean?

JanKrl commented 4 days ago

Great to hear. did you clean the env manually or use make clean?

I use venv, so I removed .venv directory and created it again.