apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.54k stars 3.54k forks source link

[Python] Armv7 orc and flight not supported for build. Compat error on using with spark #26271

Closed asfimport closed 3 years ago

asfimport commented 4 years ago

I'm using a Arm Cortex A9 processor on the Xilinx Pynq Z2 board. People have tried to use it for the raspberry pi 3 without luck in previous posts.

I figured out how to successfully build it for armv7 using the script below but cannot use orc and flight flags. People had looked into it in ARROW-8420 but I don't know if they faced these issues.

I tried converting a spark dataframe to pandas using pyarrow but now it complains about a compat feature. I have attached images below

Any help would be appreciated. Thanks

Spark Version: 2.4.5.

 The code is as follows:


import pandas as pd

df_pd = df.toPandas()
npArr = df_pd.to_numpy()

The error is as follows:-


/opt/spark/python/pyspark/sql/dataframe.py:2110: UserWarning: toPandas attempted Arrow optimization because 'spark.sql.execution.arrow.enabled' is set to true; however, failed by the reason below:
 module 'pyarrow' has no attribute 'compat'
 Attempting non-optimization as 'spark.sql.execution.arrow.fallback.enabled' is set to true.
 warnings.warn(msg)

 

Reporter: utsav

Original Issue Attachments:

Note: This issue was originally created as ARROW-10276. Please see the migration documentation for further details.

asfimport commented 4 years ago

Uwe Korn / @xhochy: [~utri092] Please don't upload screenshots as error reports but copy the text in here.

Is your board using 32bit or 64bit?

asfimport commented 4 years ago

utsav: @xhochy. My apologies. I have uploaded the build scripts and notebook along with the code I used in the issue.  The board is 32 bit. I shall outline the steps to build the wheel. The notebook cannot be used directly as I read from a hdfs. One can try to reproduce the error using any spark dataframe

 

mkdir arrow_build

cd arrow_build

Copy the three bash scripts to the arrow_build folder

./get_arrow_and_create_venv.sh

./run_build.sh

./build_pip_wheel.sh

 

asfimport commented 4 years ago

Uwe Korn / @xhochy: According to the Spark documentation, you need pyarrow==0.8.0: http://spark.apache.org/docs/2.4.5/sql-pyspark-pandas-with-arrow.html#ensure-pyarrow-installed So this seems rather a mismatch in installed pyarrow versions then actually related to Armv7.

asfimport commented 4 years ago

utsav: @xhochy will try and let you know. I guess the orc and flight flags are separate issue in themselves. At the moment it cannot build with them set to On

asfimport commented 4 years ago

utsav: @xhochy according to ARROW-8420  I posted earlier in my issue. Support for armv7 was added only in 0.17.0. I cannot use 0.8.0. I tried to build and it failed. I even set

export ARROW_PRE_0_15_IPC_FORMAT=1 in conf/spark-env.sh according to the link you sent me but no luck.

asfimport commented 4 years ago

utsav: An update. I upgraded to Spark 3.0.1 and received the same error

asfimport commented 4 years ago

Uwe Korn / @xhochy: Yes, Spark 3.0.1 is still not compatible with pyarrow=0.17, you can use 0.14 and 0.15 with the latest Spark release but not newer AFAIK. So there is currently no combination that will work for you.

asfimport commented 4 years ago

utsav: @xhochy I can use it on my desktop though. Does this issue arise if the dependencies it needs are of a specific version despite what the requirements file says? I can recall it needing NumPy and pandas.  I used numpy==1.19.2, pandas==1.1.2, six==1.15.0, pytz==2020.1 and Cython==0.29.2. My doubt arises from https://github.com/apache/arrow/issues/2468 and ARROW-3141

asfimport commented 4 years ago

Uwe Korn / @xhochy: You have to look at the differences between the pip list outputs on these two machines if it works on your desktop. The error might be coming from differing pandas versions.

asfimport commented 3 years ago

Antoine Pitrou / @pitrou: PyArrow should work on 64-bit ARM (there are wheels for it), but I don't think we have any plans to support 32-bit ARM. I'm going to close this issue for now. Feel free to open a new one if you make progress on this and can suggest concrete improvements.