conda-forge / pytorch-cpu-feedstock

A conda-smithy repository for pytorch-cpu.
BSD 3-Clause "New" or "Revised" License
18 stars 50 forks source link

Fix aarch detection #256

Closed hmaarrfk closed 1 month ago

hmaarrfk commented 2 months ago

Closes #266 Documenting a few things I learned:

find_package(Python REQUIRED COMPONENTS ...
find_package(Python 3 REQUIRED COMPONENTS ...

Seem to behave differently.

They seem to be using some custom cmake shims that define the Python_EXECUTABLE variable

        CMake.defines(
            args,
            Python_EXECUTABLE=sys.executable,
            TORCH_BUILD_VERSION=version,
            **build_options,
        )

https://github.com/pytorch/pytorch/blob/main/tools/setup_helpers/cmake.py#L310

conda-forge-webservices[bot] commented 2 months ago

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

hmaarrfk commented 2 months ago

I don't understand why it isn't searching for Python.h in ${PREFIX}/include/Python3.12 where it is located.

h-vetinari commented 2 months ago

Copying out the logs and reinstating some variables for better legibility

    find_path considered the following locations:

      $PREFIX/lib/python3.12/site-packages/include/include/python3.12.5/Python.h
      $PREFIX/lib/python3.12/site-packages/include/include/Python.h
      $PREFIX/lib/python3.12/site-packages/include/Python.h
      $PREFIX/lib/python3.12/site-packages/include/python3.12.5/Python.h
      $PREFIX/lib/python3.12/site-packages/include/Python.h
      $PREFIX/lib/python3.12/site-packages/Python.h
      $PREFIX/include/include/python3.12.5/Python.h
      $PREFIX/include/include/Python.h
      $PREFIX/include/Python.h
      $PREFIX/include/python3.12.5/Python.h
      $PREFIX/include/Python.h
      $PREFIX/Python.h
      $PREFIX/$BUILD_PREFIX/include/python3.12.5/Python.h
      $PREFIX/$BUILD_PREFIX/include/Python.h
      $PREFIX/$BUILD_PREFIX/Python.h
      $BUILD_PREFIX/aarch64-conda-linux-gnu/sysroot/$PREFIX/lib/python3.12/site-packages/include/include/python3.12.5/Python.h
      $BUILD_PREFIX/aarch64-conda-linux-gnu/sysroot/$PREFIX/lib/python3.12/site-packages/include/include/Python.h
      $BUILD_PREFIX/aarch64-conda-linux-gnu/sysroot/$PREFIX/lib/python3.12/site-packages/include/Python.h
      $BUILD_PREFIX/aarch64-conda-linux-gnu/sysroot/$PREFIX/lib/python3.12/site-packages/include/python3.12.5/Python.h
      $BUILD_PREFIX/aarch64-conda-linux-gnu/sysroot/$PREFIX/lib/python3.12/site-packages/include/Python.h
      $BUILD_PREFIX/aarch64-conda-linux-gnu/sysroot/$PREFIX/lib/python3.12/site-packages/Python.h
      $BUILD_PREFIX/aarch64-conda-linux-gnu/sysroot/$PREFIX/include/include/python3.12.5/Python.h
      $BUILD_PREFIX/aarch64-conda-linux-gnu/sysroot/$PREFIX/include/include/Python.h
      $BUILD_PREFIX/aarch64-conda-linux-gnu/sysroot/$PREFIX/include/Python.h
      $BUILD_PREFIX/aarch64-conda-linux-gnu/sysroot/$PREFIX/include/python3.12.5/Python.h
      $BUILD_PREFIX/aarch64-conda-linux-gnu/sysroot/$PREFIX/include/Python.h
      $BUILD_PREFIX/aarch64-conda-linux-gnu/sysroot/$PREFIX/Python.h
      $BUILD_PREFIX/aarch64-conda-linux-gnu/sysroot/$BUILD_PREFIX/include/python3.12.5/Python.h
      $BUILD_PREFIX/aarch64-conda-linux-gnu/sysroot/$BUILD_PREFIX/include/Python.h
      $BUILD_PREFIX/aarch64-conda-linux-gnu/sysroot/$BUILD_PREFIX/Python.h
      $SRC_DIR/$PREFIX/lib/python3.12/site-packages/include/include/python3.12.5/Python.h
      $SRC_DIR/$PREFIX/lib/python3.12/site-packages/include/include/Python.h
      $SRC_DIR/$PREFIX/lib/python3.12/site-packages/include/Python.h
      $SRC_DIR/$PREFIX/lib/python3.12/site-packages/include/python3.12.5/Python.h
      $SRC_DIR/$PREFIX/lib/python3.12/site-packages/include/Python.h
      $SRC_DIR/$PREFIX/lib/python3.12/site-packages/Python.h
      $SRC_DIR/$PREFIX/include/include/python3.12.5/Python.h
      $SRC_DIR/$PREFIX/include/include/Python.h
      $SRC_DIR/$PREFIX/include/Python.h
      $SRC_DIR/$PREFIX/include/python3.12.5/Python.h
      $SRC_DIR/$PREFIX/include/Python.h
      $SRC_DIR/$PREFIX/Python.h
      $SRC_DIR/$BUILD_PREFIX/include/python3.12.5/Python.h
      $SRC_DIR/$BUILD_PREFIX/include/Python.h
      $SRC_DIR/$BUILD_PREFIX/Python.h

    The item was not found.

the reason is that the variation we need is missing from the pattern

[...]/include/python3.12.5/Python.h     ✅
[...]/include/python3.12/Python.h       ❌
[...]/include/Python.h                  ✅
[...]/Python.h                          ✅
hmaarrfk commented 2 months ago

The host environments differences are: image

The build environment differences are image

h-vetinari commented 2 months ago

Ah, the old python2 vs python3 thing in CMake - great catch @isuruf!

BTW, while checking something else, I just saw that this is using a really old conda-build due to: https://github.com/conda-forge/pytorch-cpu-feedstock/blob/6dd85b3f85a72371b2d3ccf6a386de67e61d667e/conda-forge.yml#L28-L30

conda-forge-webservices[bot] commented 2 months ago

Hi! This is the friendly automated conda-forge-linting service.

I wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found some lint.

Here's what I've got...

For recipe/meta.yaml:

conda-forge-webservices[bot] commented 2 months ago

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

hmaarrfk commented 2 months ago

@conda-forge-admin please rerender

hmaarrfk commented 2 months ago

Now this is strange.

The build passes. But the tests are failing….

hmaarrfk commented 2 months ago

I made the cmake tests stricter than they used to be, things should fail on the CIs now, but I tried to fix it locally with no luck. Even downgrading to cmake 3.29 didn't have the desired effect.

hmaarrfk commented 2 months ago

Was something recently changed the CIs???? I'm having trouble detecting cross compiled Python on https://github.com/conda-forge/vigra-feedstock/pull/139 as well...

hmaarrfk commented 2 months ago

I'm going to try to run the CUDNN9 migration locally with emulation. will take for ever, but maybe its faster than trying to solve this problem.

However, even build 0 does not detect numpy correctly.

--   USE_NUMPY             : OFF

can be seen in the build logs.

hmaarrfk commented 2 months ago

locally i can find numpy when i use enumation. looking forward to the 60 hour build...

hmaarrfk commented 2 months ago

@conda-forge-admin please rerender

hmaarrfk commented 1 month ago

should we move to azure? the failure happens pretty early on.

isuruf commented 1 month ago

sure

hmaarrfk commented 1 month ago

@conda-forge-admin please rerender

hmaarrfk commented 1 month ago

@conda-forge-admin please rerender

hmaarrfk commented 1 month ago

:(

2024-09-21T20:45:42.4880641Z /home/conda/feedstock_root/build_artifacts/libtorch_1726938996574/_build_env/bin/aarch64-conda-linux-gnu-cc -shared -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--allow-shlib-undefined -Wl,-rpath,/home/conda/feedstock_root/build_artifacts/libtorch_1726938996574/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pl/lib -Wl,-rpath-link,/home/conda/feedstock_root/build_artifacts/libtorch_1726938996574/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pl/lib -L/home/conda/feedstock_root/build_artifacts/libtorch_1726938996574/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pl/lib -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--allow-shlib-undefined -Wl,-rpath,/home/conda/feedstock_root/build_artifacts/libtorch_1726938996574/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pl/lib -Wl,-rpath-link,/home/conda/feedstock_root/build_artifacts/libtorch_1726938996574/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pl/lib -L/home/conda/feedstock_root/build_artifacts/libtorch_1726938996574/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pl/lib -Wl,-O2 -Wl,--sort-common -Wl,-z,relro -Wl,-z,lazy -Wl,--allow-shlib-undefined -Wl,-rpath,/home/conda/feedstock_root/build_artifacts/libtorch_1726938996574/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pl/lib -Wl,-rpath-link,/home/conda/feedstock_root/build_artifacts/libtorch_1726938996574/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pl/lib -L/home/conda/feedstock_root/build_artifacts/libtorch_1726938996574/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pl/lib -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O3 -pipe -isystem /home/conda/feedstock_root/build_artifacts/libtorch_1726938996574/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pl/include -fdebug-prefix-map=/home/conda/feedstock_root/build_artifacts/libtorch_1726938996574/work=/usr/local/src/conda/libtorch-2.4.0 -fdebug-prefix-map=/home/conda/feedstock_root/build_artifacts/libtorch_1726938996574/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pl=/usr/local/src/conda-prefix -Wno-deprecated-declarations -Wno-error=maybe-uninitialized -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /home/conda/feedstock_root/build_artifacts/libtorch_1726938996574/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pl/include build/temp.linux-aarch64-cpython-312/torch/csrc/stub.o -L/home/conda/feedstock_root/build_artifacts/libtorch_1726938996574/work/torch/lib -ltorch_python -o build/lib.linux-aarch64-cpython-312/torch/_C.cpython-312-aarch64-linux-gnu.so -Wl,-rpath,$ORIGIN/lib 2024-09-21T20:45:42.4953479Z /home/conda/feedstock_root/build_artifacts/libtorch_1726938996574/_build_env/bin/../lib/gcc/aarch64-conda-linux-gnu/13.3.0/../../../../aarch64-conda-linux-gnu/bin/ld: cannot find -ltorch_python: No such file or directory

Tobias-Fischer commented 1 month ago

Do we need to manually move the libtorch_python.so to the correct directory? It lives in _build_env/venv currently which doesn't seem correct?

2024-09-21T20:45:36.2059664Z   -- Set non-toolchain portion of runtime path of "/home/conda/feedstock_root/build_artifacts/libtorch_1726938996574/_build_env/venv/lib/libc10.so" to "$ORIGIN"
2024-09-21T20:45:38.2142982Z   -- Set non-toolchain portion of runtime path of "/home/conda/feedstock_root/build_artifacts/libtorch_1726938996574/_build_env/venv/lib/libshm.so" to "$ORIGIN"
2024-09-21T20:45:38.2173476Z   -- Set non-toolchain portion of runtime path of "/home/conda/feedstock_root/build_artifacts/libtorch_1726938996574/_build_env/venv/bin/torch_shm_manager" to "$ORIGIN/../lib"
2024-09-21T20:45:38.2504164Z   -- Set non-toolchain portion of runtime path of "/home/conda/feedstock_root/build_artifacts/libtorch_1726938996574/_build_env/venv/lib/libtorch_python.so" to "$ORIGIN"
2024-09-21T20:45:38.5043675Z   -- Set non-toolchain portion of runtime path of "/home/conda/feedstock_root/build_artifacts/libtorch_1726938996574/_build_env/venv/lib/libtorch_cpu.so" to "$ORIGIN"
2024-09-21T20:45:38.5054215Z   -- Set non-toolchain portion of runtime path of "/home/conda/feedstock_root/build_artifacts/libtorch_1726938996574/_build_env/venv/lib/libtorch.so" to "$ORIGIN"
2024-09-21T20:45:38.5079700Z   -- Set non-toolchain portion of runtime path of "/home/conda/feedstock_root/build_artifacts/libtorch_1726938996574/_build_env/venv/lib/libtorch_global_deps.so" to "$ORIGIN"
2024-09-21T20:45:38.5617385Z   -- Set non-toolchain portion of runtime path of "/home/conda/feedstock_root/build_artifacts/libtorch_1726938996574/work/functorch/functorch.so" to "$ORIGIN/../torch/lib"
hmaarrfk commented 1 month ago

Yeah. That would do it. Thanks for reading through the logs so carefully.

Now how to do it tastefully is the question

hmaarrfk commented 1 month ago

I can't find anything crazy that pytorch is doing....

I think setting the CMAKE_STAGING_PREFIX is overriding the CMAKE_INSTALL_PREFIX that pytroch sets: https://cmake.org/cmake/help/latest/variable/CMAKE_STAGING_PREFIX.html

From my logs

  --   CMAKE_INSTALL_PREFIX  : /home/conda/feedstock_root/build_artifacts/debug_1727047313939/work/torch

but it still goes to the CMAKE_STAGING_PREFIX

github-actions[bot] commented 1 month ago
        Hi! This is the friendly automated conda-forge-linting service.

        I wanted to let you know that I linted all conda-recipes in your             PR (```recipe/meta.yaml```) and found some lint.

        Here's what I've got...

For recipe/meta.yaml:

hmaarrfk commented 1 month ago

So i did the following:

mkdir simple
cd simple
cat >CMakeLists.txt <<EOF
cmake_minimum_required(VERSION 3.15)
project(MyCProject C)
find_package(Python3 REQUIRED COMPONENTS Interpreter Development NumPy)
set(CMAKE_C_STANDARD 99)
include_directories(${Python3_INCLUDE_DIRS})
include_directories(${Python3_NumPy_INCLUDE_DIRS})
add_executable(my_executable main.c)
target_link_libraries(my_executable Python3::Python)
EOF

mkdir build
cd build
cmake ${CMAKE_ARGS} ..

And I still get the same error:

-- The C compiler identification is GNU 13.3.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /home/conda/feedstock_root/build_artifacts/debug_1727492314322/_build_env/bin/aarch64-conda-linux-gnu-cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
CMake Error at /home/conda/feedstock_root/build_artifacts/debug_1727492314322/_build_env/share/cmake-3.30/Modules/FindPackageHandleStandardArgs.cmake:233 (message):
  Could NOT find Python3 (missing: Python3_NumPy_INCLUDE_DIRS NumPy) (found
  version "3.12.6")
Call Stack (most recent call first):
  /home/conda/feedstock_root/build_artifacts/debug_1727492314322/_build_env/share/cmake-3.30/Modules/FindPackageHandleStandardArgs.cmake:603 (_FPHSA_FAILURE_MESSAGE)
  /home/conda/feedstock_root/build_artifacts/debug_1727492314322/_build_env/share/cmake-3.30/Modules/FindPython/Support.cmake:4001 (find_package_handle_standard_args)
  /home/conda/feedstock_root/build_artifacts/debug_1727492314322/_build_env/share/cmake-3.30/Modules/FindPython3.cmake:602 (include)
  CMakeLists.txt:5 (find_package)
hmaarrfk commented 1 month ago

to remove ambiguity that it is caused by the pytorch build system, i've simply created a simple CMakeLists.txt file.

Tobias-Fischer commented 1 month ago

Wow seems like you finally got there @hmaarrfk, thanks and well done :)!

hmaarrfk commented 1 month ago

Ok lets take it one step at a time to stop the bleed. https://github.com/conda-forge/pytorch-cpu-feedstock/pull/267