Open zhanweiw opened 1 month ago
Could you show the full build log for PyArrow?
FYI: You don't need the second patch by set PYARROW_CMAKE_GENERATOR=Visual Studio 17 2022
.
Thanks @kou ! I've attached the PyArrow compiling log together with the x64 & arm64 'arrow.lib' dump log. It seems many functions haven't been compiled into arm64 version 'arrow.lib'.
It seems that Arrow C++ uses clang-cl
but PyArrow doesn't use clang-cl
.
Can we use clang-cl
for PyArrow too?
Thanks for your suggestion.
After modify the code as below and compile again:
diff --git a/python/setup.py b/python/setup.py
index 60b9a696d..b75adb0fa 100755
--- a/python/setup.py
+++ b/python/setup.py
@@ -165,7 +165,7 @@ class build_ext(_build_ext):
_build_ext.initialize_options(self)
self.cmake_generator = os.environ.get('PYARROW_CMAKE_GENERATOR')
if not self.cmake_generator and sys.platform == 'win32':
- self.cmake_generator = 'Visual Studio 15 2017 Win64'
+ self.cmake_generator = 'Ninja'
I got the link error below. After remove the content of function 'arrow::gdb::TestSession', I can compile the PyArrow successfully. I'll test if the basic functions work.
[53/72] Linking CXX shared library arrow_python.dll
FAILED: arrow_python.dll arrow_python.lib
C:\windows\system32\cmd.exe /C "cd . && "C:\Program Files\CMake\bin\cmake.exe" -E vs_link_dll --intdir=CMakeFiles\arrow_python.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100226~1.0\arm64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100226~1.0\arm64\mt.exe --manifests -- C:\PROGRA~1\LLVM\bin\lld-link.exe /nologo CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\arrow_to_pandas.cc.obj CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\benchmark.cc.obj CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\common.cc.obj CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\datetime.cc.obj CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\decimal.cc.obj CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\deserialize.cc.obj CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\extension_type.cc.obj CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\gdb.cc.obj CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\helpers.cc.obj CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\inference.cc.obj CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\io.cc.obj CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\ipc.cc.obj CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\numpy_convert.cc.obj CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\numpy_init.cc.obj CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\numpy_to_arrow.cc.obj CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\python_test.cc.obj CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\python_to_arrow.cc.obj CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\pyarrow.cc.obj CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\serialize.cc.obj CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\udf.cc.obj CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\csv.cc.obj CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\filesystem.cc.obj /out:arrow_python.dll /implib:arrow_python.lib /pdb:arrow_python.pdb /dll /version:0.0 /machine:ARM64 /NODEFAULTLIB:LIBCMT /INCREMENTAL:NO C:\source\arrow\Install\lib\arrow_dataset.lib C:\source\arrow\Install\lib\arrow_acero.lib C:\source\arrow\Install\lib\parquet.lib C:\source\arrow\Install\lib\arrow.lib ws2_32.lib C:\Programs\Python\Python312-arm64\libs\python312.lib kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && cd ."
LINK: command "C:\PROGRA~1\LLVM\bin\lld-link.exe /nologo CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\arrow_to_pandas.cc.obj CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\benchmark.cc.obj CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\common.cc.obj CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\datetime.cc.obj CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\decimal.cc.obj CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\deserialize.cc.obj CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\extension_type.cc.obj CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\gdb.cc.obj CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\helpers.cc.obj CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\inference.cc.obj CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\io.cc.obj CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\ipc.cc.obj CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\numpy_convert.cc.obj CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\numpy_init.cc.obj CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\numpy_to_arrow.cc.obj CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\python_test.cc.obj CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\python_to_arrow.cc.obj CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\pyarrow.cc.obj CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\serialize.cc.obj CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\udf.cc.obj CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\csv.cc.obj CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\filesystem.cc.obj /out:arrow_python.dll /implib:arrow_python.lib /pdb:arrow_python.pdb /dll /version:0.0 /machine:ARM64 /NODEFAULTLIB:LIBCMT /INCREMENTAL:NO C:\source\arrow\Install\lib\arrow_dataset.lib C:\source\arrow\Install\lib\arrow_acero.lib C:\source\arrow\Install\lib\parquet.lib C:\source\arrow\Install\lib\arrow.lib ws2_32.lib C:\Programs\Python\Python312-arm64\libs\python312.lib kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib /MANIFEST:EMBED,ID=2" failed (exit code 1) with the following output:
lld-link: error: undefined symbol: __declspec(dllimport) public: __cdecl arrow::TimeScalar<class arrow::Time32Type>::TimeScalar<class arrow::Time32Type>(int, enum arrow::TimeUnit::type)
>>> referenced by CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\gdb.cc.obj:(void __cdecl arrow::gdb::TestSession(void))
>>> referenced by CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\gdb.cc.obj:(void __cdecl arrow::gdb::TestSession(void))
lld-link: error: undefined symbol: __declspec(dllimport) public: __cdecl arrow::TimeScalar<class arrow::Time64Type>::TimeScalar<class arrow::Time64Type>(__int64, enum arrow::TimeUnit::type)
>>> referenced by CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\gdb.cc.obj:(void __cdecl arrow::gdb::TestSession(void))
>>> referenced by CMakeFiles\arrow_python.dir\pyarrow\src\arrow\python\gdb.cc.obj:(void __cdecl arrow::gdb::TestSession(void))
OK. Could you open a PR that updates https://arrow.apache.org/docs/developers/python.html ? We need to add https://arrow.apache.org/docs/developers/cpp/windows.html#building-on-windows-arm64-using-ninja-and-clang like document to there.
For the cmake_generator
change: We can use set PYARROW_CMAKE_GENERATOR=Ninja
as I mentioned at https://github.com/apache/arrow/issues/44310#issuecomment-2395321324 .
For the link error: Could you try the following?
diff --git a/cpp/src/arrow/scalar.h b/cpp/src/arrow/scalar.h
index 7a273c46c1..a4fd9453ef 100644
--- a/cpp/src/arrow/scalar.h
+++ b/cpp/src/arrow/scalar.h
@@ -464,7 +464,7 @@ struct ARROW_EXPORT Date64Scalar : public DateScalar<Date64Type> {
};
template <typename T>
-struct ARROW_EXPORT TimeScalar : public TemporalScalar<T> {
+struct TimeScalar : public TemporalScalar<T> {
using TemporalScalar<T>::TemporalScalar;
TimeScalar(typename TemporalScalar<T>::ValueType value, TimeUnit::type unit)
@kou I can compile it successfully by below steps. Need modify the code to disable 'ARROW_BUILD_BUNDLED_DEPENDENCIES'. And also need to add a PyArror version information 'version="17.0.0"' in 'setup.py': If not disable this, will get this error('arrow_bundled_dependencies.lib' can't be found, it hasn't be compiled.):
[190/191] Install the project...-- Install configuration: "RELEASE"
-- Installing: C:/zhanweiw/source/Python/Src/arrow/Install/include/arrow/util/config.h
CMake Error at src/arrow/cmake_install.cmake:40 (file):
file INSTALL cannot find
"C:/zhanweiw/source/Python/Src/arrow/cpp/build/release/arrow_bundled_dependencies.lib":
File exists.
Call Stack (most recent call first):
cmake_install.cmake:37 (include)
Install compile environment: a. Install ARM64 Python 3.12.6 and necessary Python extension. b. Install LLVM(https://github.com/llvm/llvm-project/releases/download/llvmorg-18.1.8/LLVM-18.1.8-woa64.exe). c. Visual Studio(Enable ARM64 support). d. Cmake.
Open "ARM64 Native Tools Command Prompt for VS 2022" command line and run below commands:
cd C:\source
git clone https://github.com/apache/arrow.git
Add below patch:
diff --git a/cpp/src/arrow/CMakeLists.txt b/cpp/src/arrow/CMakeLists.txt
index c911f0f4e..ddd4dc0bb 100644
--- a/cpp/src/arrow/CMakeLists.txt
+++ b/cpp/src/arrow/CMakeLists.txt
@@ -955,7 +955,7 @@ if(CXX_LINKER_SUPPORTS_VERSION_SCRIPT)
endif()
if(ARROW_BUILD_STATIC AND ARROW_BUNDLED_STATIC_LIBS)
- set(ARROW_BUILD_BUNDLED_DEPENDENCIES TRUE)
+ set(ARROW_BUILD_BUNDLED_DEPENDENCIES FALSE)
else()
set(ARROW_BUILD_BUNDLED_DEPENDENCIES FALSE)
endif()
diff --git a/cpp/src/arrow/scalar.h b/cpp/src/arrow/scalar.h
index 7a273c46c..a4fd9453e 100644
--- a/cpp/src/arrow/scalar.h
+++ b/cpp/src/arrow/scalar.h
@@ -464,7 +464,7 @@ struct ARROW_EXPORT Date64Scalar : public DateScalar<Date64Type> {
};
template <typename T>
-struct ARROW_EXPORT TimeScalar : public TemporalScalar<T> {
+struct TimeScalar : public TemporalScalar<T> {
using TemporalScalar<T>::TemporalScalar;
TimeScalar(typename TemporalScalar<T>::ValueType value, TimeUnit::type unit)
diff --git a/python/setup.py b/python/setup.py
index 60b9a696d..36c3afa9f 100755
--- a/python/setup.py
+++ b/python/setup.py
@@ -400,6 +400,7 @@ setup(
distclass=BinaryDistribution,
# Dummy extension to trigger build_ext
ext_modules=[Extension('__dummy__', sources=[])],
+ version="17.0.0",
cmdclass={
'build_ext': build_ext
},
set ARROW_HOME=C:\source\arrow\Install
set CMAKE_PREFIX_PATH=C:\source\arrow\Install
set PYARROW_CMAKE_GENERATOR=Ninja
mkdir arrow\cpp\build pushd arrow\cpp\build
set CC=clang-cl set CXX=clang-cl
cmake -G "Ninja" -DCMAKE_INSTALL_PREFIX=%ARROW_HOME% -DCMAKE_UNITY_BUILD=ON -DARROW_COMPUTE=ON -DARROW_CSV=ON -DARROW_DATASET=ON -DARROW_FILESYSTEM=ON -DARROW_HDFS=ON -DARROW_JSON=ON -DARROW_PARQUET=ON -DARROW_WITH_LZ4=ON -DARROW_WITH_SNAPPY=ON -DARROW_WITH_ZLIB=ON -DARROW_WITH_ZSTD=ON ..
cmake --build . --target install --config Release
popd
4. Run commands below to compile PyArrow:
pushd arrow\python set PYARROW_BUNDLE_ARROW_CPP=1 python setup.py build_ext --inplace python setup.py bdist_wheel popd
We can work on the second diff (cpp/src/arrow/scalar.h
) in #44364.
Could you open a new issue for the first diff (cpp/src/arrow/CMakeLists.txt
)? Let's work on it as a separated task.
Why do we need the third diff (python/setup.py
)?
Without the third diff, I'll get 'pyarrow-0-cp312-cp312-win_arm64.whl'. With the change, I'll get 'pyarrow-17.0.0-cp312-cp312-win_arm64.whl'.
I've created new ticket for building 'arrow_bundled_dependencies.lib' issue: https://github.com/apache/arrow/issues/44368
Without the third diff, I'll get 'pyarrow-0-cp312-cp312-win_arm64.whl'. With the change, I'll get 'pyarrow-17.0.0-cp312-cp312-win_arm64.whl'.
Hmm. It's strange.
should be used for version information.
@jorisvandenbossche Do you have any idea why this was happen? ("the third diff" is a diff in https://github.com/apache/arrow/issues/44310#issuecomment-2402288125 .)
@kou
I can't find file 'pyarrow/_generated_version.py' from build:
version_file = 'pyarrow/_generated_version.py'
And it seems no code is using "fallback_version" in ARM64 windows: https://github.com/search?q=repo%3Aapache%2Farrow+fallback_version&type=code
I think that pyarrow/_generated_version.py
is generated automatically.
I think that fallback_version
is used by "setuptools" not PyArrow.
Could you try python -m pip install .
instead of python setup.py build_ext --inplace
and python setup.py bdist_wheel
?
I think that
pyarrow/_generated_version.py
is generated automatically.I think that
fallback_version
is used by "setuptools" not PyArrow.Could you try
python -m pip install .
instead ofpython setup.py build_ext --inplace
andpython setup.py bdist_wheel
?
Got below issue with above command:
[114/334] Compiling C object numpy/_core/_multiarray_tests.cp312-win_arm64.pyd.p/meson-generated__multiarray_tests.c.obj
FAILED: numpy/_core/_multiarray_tests.cp312-win_arm64.pyd.p/meson-generated__multiarray_tests.c.obj
"clang-cl" "-Inumpy\_core\_multiarray_tests.cp312-win_arm64.pyd.p" "-Inumpy\_core" "-I..\numpy\_core" "-I..\numpy\_core\src\multiarray" "-I..\numpy\_core\src\npymath" "-Inumpy\_core\include" "-I..\numpy\_core\include" "-I..\numpy\_core\src\common" "-IC:\Programs\Python\Python312-arm64\Include" "-IC:\Users\zhanw\AppData\Local\Temp\pip-install-l3cp_q1s\numpy_e774c5d2837a4788b968dbaa52f05202\.mesonpy-8gh30u70\meson_cpu" "-DNDEBUG" "/MD" "/nologo" "/showIncludes" "/utf-8" "/W2" "/clang:-std=c11" "/O2" "/Gw" "-fno-strict-aliasing" "/clang:-ftrapping-math" "-DNPY_HAVE_CLANG_FPSTRICT" "-DNPY_HAVE_NEON_VFPV4" "-DNPY_HAVE_NEON_FP16" "-DNPY_HAVE_NEON" "-DNPY_HAVE_ASIMD" "-DNPY_INTERNAL_BUILD" "-DHAVE_NPY_CONFIG_H" "-D_FILE_OFFSET_BITS=64" "-D_LARGEFILE_SOURCE=1" "-D_LARGEFILE64_SOURCE=1" "/Fdnumpy\_core\_multiarray_tests.cp312-win_arm64.pyd.p\meson-generated__multiarray_tests.c.pdb" /Fonumpy/_core/_multiarray_tests.cp312-win_arm64.pyd.p/meson-generated__multiarray_tests.c.obj "/c" numpy/_core/_multiarray_tests.cp312-win_arm64.pyd.p/_multiarray_tests.c
..\numpy\_core\src\multiarray\_multiarray_tests.c.src(1883,17): error: invalid operand in inline asm: 'fstcw ${0:w}'
1883 | __asm__("fstcw %w0" : "=m" (cw));
| ^
..\numpy\_core\src\multiarray\_multiarray_tests.c.src(1883,17): error: unrecognized instruction mnemonic
<inline asm>(1,2): note: instantiated into assembly here
1 | fstcw
| ^
2 errors generated.
May I get your support on compiling ARM64 PyArrow. I’m trying to compile it through the below steps on Windows on ARM device. When I run the last line command ‘python setup.py build_ext –inplace’ in step 5 to compile the PyArrow(Python extension), I get many error messages like below:
By comparing the ‘DUMPBIN’ result of ‘arrow.lib’ between x64 & arm64, the ARM64 lib missed the function with the parameter 'class std::shared_ptr const &,class arrow::MemoryPool *' in 'arrow.lib':
LargeBinaryBuilder is a template class. Do you have idea why the function missed while compiling it to ARM64? And how to fix it? Thanks in advance!
Due to the reason mentioned in below link, I compiled the Cpp library with clang-cl for ARM64, but compiled the x64 version with MSVC cl. https://arrow.apache.org/docs/developers/cpp/windows.html#building-on-windows-arm64-using-ninja-and-clang
The detailed steps:
Component(s)
C++, Python