Closed regro-cf-autotick-bot closed 1 month ago
Hi! This is the friendly automated conda-forge-linting service.
I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml
) and found it was in an excellent condition.
~if anybody could rebuild OSX that would be great!~
Edit: nevermind, i forgot about
cudnn 9 would make this package incompatible with cupy. https://github.com/conda-forge/cupy-feedstock/pull/286
Would it be appropriate to build out 2 versions, one for cudnn 9, the other for cudnn 8???
Hi! This is the friendly automated conda-forge-linting service.
I wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml
) and found some lint.
Here's what I've got...
For recipe/meta.yaml:
Hi! This is the friendly automated conda-forge-linting service.
I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml
) and found it was in an excellent condition.
I thought that it i could get things to work by downgrading to 2.4.0 instead of 2.4.1.... but it still doesn't work
It looks like it's going ok so far? :)
No I ran it locally in docker on a linux-64 host and ran into the same problem as https://github.com/conda-forge/pytorch-cpu-feedstock/pull/256
Alright fingers crossed.
Ok fingers crossed.
@leofang I believe I got the export variable correct here. I'll have to check after this build is complete but we can try to ensure we close https://github.com/conda-forge/pytorch-cpu-feedstock/issues/264 as part of this.
Thank you @h-vetinari for the fixes to the rerendering. I'm trying to get this done before attempting to look at the protobuf issues in https://github.com/conda-forge/pytorch-cpu-feedstock/pull/265 it seems that there are some things to fixup and cleanup.
Added a few "fixes" notes to the OP based on the discussion here/diff. Please double check that it looks ok
/home/conda/feedstock_root/build_artifacts/libtorch_1727958353053/_build_env/bin/../lib/gcc/aarch64-conda-linux-gnu/12.4.0/../../../../aarch64-conda-linux-gnu/bin/ld: error: linker script file '/home/conda/feedstock_root/build_artifacts/libtorch_1727958353053/work/cmake/linker_script.ld' appears multiple times
Any ideas @leofang ???
I disabled it for now to ensure that the rest of the build passes.
For reference, this is the CI build (also attached log for posterity)
Searching did not provide lots of results, but did find this thread. Wasn't seeing a clear resolution in that thread. Though it does indicate someone else ran into this issue (presumably building with their system toolchain). This makes me think there is a bug in PyTorch's build scripts
Different library, but same error ( https://github.com/m01/rock64-arch-linux-build/issues/17 ). In that case, the upstream fix linked is commit ( https://github.com/u-boot/u-boot/commit/d6507e6fd9e0a6f1a8dd28c18cd320c1f861269e ), which removes a duplicated target from a Makefile
So is there some target that is being rebuilt multiple times?
is this running in emulation mode? the build is taking a really long time...
Think the configuration needs a tweak. Noted above: https://github.com/conda-forge/pytorch-cpu-feedstock/pull/261#discussion_r1786911294
Edit: Also a re-render after ( which now can be done with the bot https://github.com/conda-forge/admin-requests/pull/1098 ! 🥳 )
So is there some target that is being rebuilt multiple times?
It seems to be the linker script being passed to the linker twice
-- Shared LD flags : -Wl,-O2 -Wl,--sort-common -Wl,-z,relro -Wl,-z,lazy -Wl,--allow-shlib-undefined -Wl,-rpath,/home/conda/feedstock_root/build_artifacts/libtorch_1727958353053/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pl/lib -Wl,-rpath-link,/home/conda/feedstock_root/build_artifacts/libtorch_1727958353053/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pl/lib -L/home/conda/feedstock_root/build_artifacts/libtorch_1727958353053/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pl/lib -L/home/conda/feedstock_root/build_artifacts/libtorch_1727958353053/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pl/targets/sbsa-linux/lib -L/home/conda/feedstock_root/build_artifacts/libtorch_1727958353053/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pl/targets/sbsa-linux/lib/stubs -L/home/conda/feedstock_root/build_artifacts/libtorch_1727958353053/_build_env/targets/sbsa-linux/lib -L/home/conda/feedstock_root/build_artifacts/libtorch_1727958353053/_build_env/targets/sbsa-linux/lib/stubs -T/home/conda/feedstock_root/build_artifacts/libtorch_1727958353053/work/cmake/linker_script.ld -Wl,--no-as-needed -Wl,-O2 -Wl,--sort-common -Wl,-z,relro -Wl,-z,lazy -Wl,--allow-shlib-undefined -Wl,-rpath,/home/conda/feedstock_root/build_artifacts/libtorch_1727958353053/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pl/lib -Wl,-rpath-link,/home/conda/feedstock_root/build_artifacts/libtorch_1727958353053/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pl/lib -L/home/conda/feedstock_root/build_artifacts/libtorch_1727958353053/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pl/lib -L/home/conda/feedstock_root/build_artifacts/libtorch_1727958353053/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pl/targets/sbsa-linux/lib -L/home/conda/feedstock_root/build_artifacts/libtorch_1727958353053/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pl/targets/sbsa-linux/lib/stubs -L/home/conda/feedstock_root/build_artifacts/libtorch_1727958353053/_build_env/targets/sbsa-linux/lib -L/home/conda/feedstock_root/build_artifacts/libtorch_1727958353053/_build_env/targets/sbsa-linux/lib/stubs -T/home/conda/feedstock_root/build_artifacts/libtorch_1727958353053/work/cmake/linker_script.ld -rdynamic
I can dig around. There are too many changes in flux so may not be easily isolated.
There are too many changes in flux so may not be easily isolated.
Ok let me try to get this new image built out on alma, and we can try in a separate PR for the linker optimization.
Ok let me try to get this new image built out on alma, and we can try in a separate PR for the linker optimization.
Yes plz do. I can also try to build the recipe locally and poke around next week, or just ask my colleague if this line is redundant and the root cause of the linker flag being passed twice: https://github.com/pytorch/pytorch/blob/a93d3873e97973fbc0009245579ee4e4fa7f155a/setup.py#L1173
It seems to be the linker script being passed to the linker twice
This is almost certainly the issue
The harder question is why that is happening
Edit: Missed the last bit. That setup.py
doesn't look right. Agree we should patch that out. Though it may be covering up deeper configuration issues. This would benefit from an upstream issue/PR at the end to cleanup this stuff
PS. I am trying to upstream whatever patches I find: https://github.com/pytorch/pytorch/pull/137084
so when we do have a suggestion, we shoul definitely make it ;)
well the build seems to be progressing so lets let it run. we can unskip and merge in 10-12 hours.
Fixes to the LD script definitely welcome in that time ;)
Wow i spoke way too soon...
/home/conda/feedstock_root/build_artifacts/debug_1728005337762/_build_env/bin/../lib/gcc/aarch64-conda-linux-g
nu/12.4.0/../../../../aarch64-conda-linux-gnu/bin/ld: cannot represent machine `i386:x86-64'
diff --git a/tools/setup_helpers/generate_linker_script.py b/tools/setup_helpers/generate_linker_script.py
index 11c397a9e..32f817fba 100644
--- a/tools/setup_helpers/generate_linker_script.py
+++ b/tools/setup_helpers/generate_linker_script.py
@@ -1,4 +1,5 @@
import subprocess
+import os
def gen_linker_script(
@@ -10,7 +11,8 @@ def gen_linker_script(
line.replace("\n", "") for line in prioritized_text if line != "\n"
]
- linker_script_lines = subprocess.check_output(["ld", "-verbose"], text=True).split(
+ ld = os.environ.get('LD', 'ld')
+ linker_script_lines = subprocess.check_output([ld, "-verbose"], text=True).split(
"\n"
)
probably a better patch candidate.
Thanks Mark! 🙏
Yeah was wondering if that linker script line was covering a deeper issue
Think we will need to take a closer look
Edit: Didn't see your new patch. Worth a try
I tried, i get back to the fact that the linker script is included twice. (I'm in a debug session)
With 30 cores (threadripper 2950 so they are really wimpy) you can get a failure at file 920/6271 with the changing the linker to take the environment variable.
Fingers crossed this does it:
diff --git a/CMakeLists.txt b/CMakeLists.txt
index c4cd4b2c2..3d4243179 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -532,17 +532,6 @@ endif()
option(TRACING_BASED
"Master flag to build Lite Interpreter with tracing build option" OFF)
option(BUILD_EXECUTORCH "Master flag to build Executorch" ON)
-# This is a fix for a rare build issue on Ubuntu: symbol lookup error:
-# miniconda3/envs/pytorch-py3.7/lib/libmkl_intel_lp64.so: undefined symbol:
-# mkl_blas_dsyrk
-# https://software.intel.com/en-us/articles/symbol-lookup-error-when-linking-intel-mkl-with-gcc-on-ubuntu
-if(LINUX)
- set(CMAKE_SHARED_LINKER_FLAGS
- "${CMAKE_SHARED_LINKER_FLAGS} -Wl,--no-as-needed")
- set(CMAKE_SHARED_LINKER_FLAGS
- "${CMAKE_SHARED_LINKER_FLAGS} $ENV{LDFLAGS}")
-endif()
-
if(MSVC)
# MSVC by default does not apply the correct __cplusplus version as specified
# by the C++ standard because MSVC is not a completely compliant
diff --git a/tools/setup_helpers/generate_linker_script.py b/tools/setup_helpers/generate_linker_script.py
index 11c397a9e..9009604f2 100644
--- a/tools/setup_helpers/generate_linker_script.py
+++ b/tools/setup_helpers/generate_linker_script.py
@@ -1,4 +1,5 @@
import subprocess
+import os
def gen_linker_script(
@@ -10,7 +11,8 @@ def gen_linker_script(
line.replace("\n", "") for line in prioritized_text if line != "\n"
]
- linker_script_lines = subprocess.check_output(["ld", "-verbose"], text=True).split(
+ ld = os.environ.get("LD", "ld")
+ linker_script_lines = subprocess.check_output([ld, "-verbose"], text=True).split(
"\n"
)
seems to have gotten to 1461+
Their fix is included in https://github.com/pytorch/pytorch/commit/f217b470cc7ebacc62c8e87dbab8c4894d53e9b9
well feel free to look at aarch logs: https://github.com/conda-forge/pytorch-cpu-feedstock/actions/runs/11173834884/job/31062549907
other linux: https://github.com/conda-forge/pytorch-cpu-feedstock/actions/runs/11147217363/job/30981011412
other osx: https://github.com/conda-forge/pytorch-cpu-feedstock/runs/30993778983
It looks like the ARM CPU build on CI (attached log) had this error
CMake Error: The current CMakeCache.txt directory /home/conda/feedstock_root/build_artifacts/libtorch_1728060831124/work/build/CMakeCache.txt is different than the directory /home/conda/feedstock_root/build_artifacts/libtorch_172806083.94/work/build where CMakeCache.txt was created. This may result in binaries being created in the wrong place. If you are not sure, reedit the CMakeCache.txt
Lol. That's a funny bug.
oh was I doing abbreviated builds so i missed it.... thanks isuru!!!
You need a timestamp with 3*12
to hit the bug. :smile:
wow ok.... yeah i guess sed is bad.... we could switch t their new workflow where they claim to be able to cleanly split hte library and non-library portion.
Am seeing the following error in this CI job (with attached log):
-- Using third party subdirectory Eigen.
CMake Error at /home/conda/feedstock_root/build_artifacts/libtorch_1728075572459/_build_env/share/cmake-3.30/Modules/FindPackageHandleStandardArgs.cmake:233 (message):
Could NOT find Python (missing: Python_INCLUDE_DIRS Interpreter
Development.Module NumPy) (found version "3.11.10")
Call Stack (most recent call first):
/home/conda/feedstock_root/build_artifacts/libtorch_1728075572459/_build_env/share/cmake-3.30/Modules/FindPackageHandleStandardArgs.cmake:603 (_FPHSA_FAILURE_MESSAGE)
/home/conda/feedstock_root/build_artifacts/libtorch_1728075572459/_build_env/share/cmake-3.30/Modules/FindPython.cmake:673 (find_package_handle_standard_args)
cmake/Dependencies.cmake:857 (find_package)
CMakeLists.txt:857 (include)
Edit: Another jobs fails with the same error
its probably because i hardcoded the numpy include dir and its frozen in.
What if we added a symlink to the NumPy headers using a Python version agnostic location and provided the latter as input?
you are welcome to try, i can back off.
actuallly my guess is that cmake is invoked through a new codepath i didn't capture in my patch.
I trust your judgment Mark. Please let me know if I'm just missing things
Edit: Fixed a typo
I’m too tired for today. Just giving ideas.
I think it is a difference between numpy 2.0 and 2.1.
Is there a reason to to build with 2.0 vs 2.1?
The reason i ask is that the migration pins to 2.0: https://github.com/conda-forge/conda-forge-pinning-feedstock/blob/main/recipe/migrations/numpy2.yaml#L48
Is there a reason to to build with 2.0 vs 2.1?
Just because we started out conservatively. You can use 2.1 without issue, or pin to 2
directly
i'm not so convinced about my 2.1 theory anymore. I had successful builds with Linux64 and aarch64 in the past...
@isuruf it seems the sed “fix” was at the source of the issue. Any other ideas on how to make sed better? Or should we just restart builds periodically.
This PR has been triggered in an effort to update python313.
Notes and instructions for merging this PR:
Please note that if you close this PR we presume that the feedstock has been rebuilt, so if you are going to perform the rebuild yourself don't close this PR until the your rebuild has been merged.
If this PR was opened in error or needs to be updated please add the
bot-rerun
label to this PR. The bot will close this PR and schedule another one. If you do not have permissions to add this label, you can use the phrase code>@<space/conda-forge-admin, please rerun bot in a PR comment to have theconda-forge-admin
add it for you.This PR was created by the regro-cf-autotick-bot. The regro-cf-autotick-bot is a service to automatically track the dependency graph, migrate packages, and propose package version updates for conda-forge. Feel free to drop us a line if there are any issues! This PR was generated by - please use this URL for debugging.
Also includes...
Fixes https://github.com/conda-forge/pytorch-cpu-feedstock/issues/264 Fixes https://github.com/conda-forge/pytorch-cpu-feedstock/pull/259