Closed dalcinl closed 3 months ago
Hi! This is the friendly automated conda-forge-linting service.
I just wanted to let you know that I linted all conda-recipes in your PR (recipe
) and found it was in an excellent condition.
@conda-forge-admin, please rerender
Hi! This is the friendly automated conda-forge-webservice.
I tried to rerender for you, but it looks like there was nothing to do.
This message was generated by GitHub actions workflow run https://github.com/conda-forge/openmpi-feedstock/actions/runs/7786526756.
@conda-forge/core We need some help here. We keep hitting the unicode error after merging #141 (the CI was green there, but the error started happening at main). Now we can reproduce the error even in the CI, so I can only assume this is due to some change in build tool that unfortunately intervened. I asked in the Gitter channel last week but didn't get any response so far...
2024-02-05T10:57:22.0104022Z Warning: rpath /home/conda/feedstock_root/build_artifacts/openmpi-mpi_1707129643574/_build_env/lib is outside prefix /home/conda/feedstock_root/build_artifacts/openmpi-mpi_1707129643574/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold (removing it)
2024-02-05T10:57:22.0576062Z Warning: rpath /home/conda/feedstock_root/build_artifacts/openmpi-mpi_1707129643574/_build_env/lib is outside prefix /home/conda/feedstock_root/build_artifacts/openmpi-mpi_1707129643574/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold (removing it)
2024-02-05T10:57:22.1059741Z Traceback (most recent call last):
2024-02-05T10:57:22.1060663Z File "/opt/conda/lib/python3.10/site-packages/conda_build/os_utils/liefldd.py", line 54, in ensure_binary
2024-02-05T10:57:22.1067614Z return lief.parse(str(file))
2024-02-05T10:57:22.1068672Z TypeError: '/home/conda/feedstock_root/build_artifacts/openmpi-mpi_1707129643574/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold/lib/\x01\udce4\x05'
2024-02-05T10:57:22.1069304Z
2024-02-05T10:57:22.1069548Z During handling of the above exception, another exception occurred:
2024-02-05T10:57:22.1069815Z
2024-02-05T10:57:22.1070014Z Traceback (most recent call last):
2024-02-05T10:57:22.1070354Z File "/opt/conda/bin/conda-mambabuild", line 10, in <module>
2024-02-05T10:57:22.1070660Z sys.exit(main())
2024-02-05T10:57:22.1071014Z File "/opt/conda/lib/python3.10/site-packages/boa/cli/mambabuild.py", line 256, in main
2024-02-05T10:57:22.1077424Z call_conda_build(action, config)
2024-02-05T10:57:22.1078241Z File "/opt/conda/lib/python3.10/site-packages/boa/cli/mambabuild.py", line 228, in call_conda_build
2024-02-05T10:57:22.1078616Z result = api.build(
2024-02-05T10:57:22.1079028Z File "/opt/conda/lib/python3.10/site-packages/conda_build/api.py", line 254, in build
2024-02-05T10:57:22.1084360Z return build_tree(
2024-02-05T10:57:22.1084989Z File "/opt/conda/lib/python3.10/site-packages/conda_build/build.py", line 3789, in build_tree
2024-02-05T10:57:22.1097473Z packages_from_this = build(
2024-02-05T10:57:22.1098008Z File "/opt/conda/lib/python3.10/site-packages/conda_build/build.py", line 2877, in build
2024-02-05T10:57:22.1104652Z newly_built_packages = bundlers[pkg_type](output_d, m, env, stats)
2024-02-05T10:57:22.1105254Z File "/opt/conda/lib/python3.10/site-packages/conda_build/build.py", line 2004, in bundle_conda
2024-02-05T10:57:22.1109746Z files = post_process_files(metadata, initial_files)
2024-02-05T10:57:22.1110373Z File "/opt/conda/lib/python3.10/site-packages/conda_build/build.py", line 1815, in post_process_files
2024-02-05T10:57:22.1114364Z post_build(m, new_files, build_python=python)
2024-02-05T10:57:22.1114887Z File "/opt/conda/lib/python3.10/site-packages/conda_build/post.py", line 1818, in post_build
2024-02-05T10:57:22.1125728Z post_process_shared_lib(m, f, prefix_files, host_prefix)
2024-02-05T10:57:22.1126377Z File "/opt/conda/lib/python3.10/site-packages/conda_build/post.py", line 1680, in post_process_shared_lib
2024-02-05T10:57:22.1130273Z mk_relative_linux(
2024-02-05T10:57:22.1135257Z File "/opt/conda/lib/python3.10/site-packages/conda_build/post.py", line 612, in mk_relative_linux
2024-02-05T10:57:22.1135716Z existing2, _, _ = get_rpaths_raw(elf)
2024-02-05T10:57:22.1136164Z File "/opt/conda/lib/python3.10/site-packages/conda_build/os_utils/liefldd.py", line 206, in get_rpathy_thing_raw_partial
2024-02-05T10:57:22.1136517Z binary = ensure_binary(file)
2024-02-05T10:57:22.1136924Z File "/opt/conda/lib/python3.10/site-packages/conda_build/os_utils/liefldd.py", line 56, in ensure_binary
2024-02-05T10:57:22.1137298Z print(f"WARNING: liefldd: failed to ensure_binary({file})")
2024-02-05T10:57:22.1137708Z UnicodeEncodeError: 'utf-8' codec can't encode character '\udce4' in position 303: surrogates not allowed
2024-02-05T10:57:29.2571498Z
2024-02-05T10:57:29.2638583Z ##[error]Bash exited with code '1'.
2024-02-05T10:57:29.2801376Z ##[section]Finishing: Run docker build
@dalcinl do you think any of the changes that we made could lead to this file being generated?
/home/conda/feedstock_root/build_artifacts/openmpi-mpi_1707129643574/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold/lib/\x01\udce4\x05
which seems an odd one to me
@dalcinl do you think any of the changes that we made could lead to this file being generated?
I don't think so, but I'll double check our PR diff.
it might be a difference in the lief version... maybe compare and try to pin?
liblief/py-lief are both staying on 0.12.3, nothing has changed 🤔
sometimes i download both logs, delete everything until Z
and just run vimdiff
to visually inspect the differences
Looks like some builds are running out of space. Should we try adding to conda-forge.yml
(and re-render)?
azure:
free_disk_space: true
@conda-forge-admin, please rerender
No luck, out of disk space doesn't seem to be the issue of having a filename with weird unicode
Previously saw this error in this CI log
conda.CondaMultiError: Error with archive /home/conda/feedstock_root/build_artifacts/pkg_cache/libstdcxx-devel_linux-64-11.4.0-h922705a_105.conda. You probably need to delete and re-download or re-create this file. Message was:
failed with error: [Errno 28] No space left on device
Error with archive /home/conda/feedstock_root/build_artifacts/pkg_cache/gfortran_impl_linux-aarch64-11.4.0-hfbda5c0_5.conda. You probably need to delete and re-download or re-create this file. Message was:
failed with error: [Errno 28] No space left on device
Error with archive /home/conda/feedstock_root/build_artifacts/pkg_cache/gxx_impl_linux-aarch64-11.4.0-he533754_5.conda. You probably need to delete and re-download or re-create this file. Message was:
failed with error: [Errno 28] No space left on device
Error with archive /home/conda/feedstock_root/build_artifacts/pkg_cache/libgfortran5-13.2.0-ha4646dd_5.conda. You probably need to delete and re-download or re-create this file. Message was:
failed with error: [Errno 28] No space left on device
Error with archive /home/conda/feedstock_root/build_artifacts/pkg_cache/binutils_impl_linux-64-2.40-hf600244_0.conda. You probably need to delete and re-download or re-create this file. Message was:
failed with error: [Errno 28] No space left on device
Error with archive /home/conda/feedstock_root/build_artifacts/pkg_cache/libsanitizer-11.4.0-h4dcbe23_5.conda. You probably need to delete and re-download or re-create this file. Message was:
failed with error: [Errno 28] No space left on device
Error with archive /home/conda/feedstock_root/build_artifacts/pkg_cache/gcc_impl_linux-aarch64-11.4.0-he533754_5.conda. You probably need to delete and re-download or re-create this file. Message was:
failed with error: [Errno 28] No space left on device
Yeah we were not concerned with that, we've been focusing on fixing this issue: https://github.com/conda-forge/openmpi-feedstock/pull/142#issuecomment-1927240287, starting here: https://github.com/conda-forge/openmpi-feedstock/pull/141#issuecomment-1915451557.
@conda-forge-admin, please rerender
If that doesn't work, would suggest creating a diff of the dependencies installed when it was working and after it broke. That may shed some light on other relevant changes
Edit: For example, wouldn't be surprised if there was a buggy version of LIEF and we need to downgrade
Edit 2: It looks similar to issue ( https://github.com/lief-project/LIEF/issues/653 )
Maybe we could try switching to patchelf
instead of LIEF
(for example):
build:
...
rpaths_patcher: patchelf
here's a diff of the build logs.
the only package difference between bad/good is in the host environment:
- openssl: 3.2.1-hd590300_0 conda-forge
- rdma-core: 50.0-hd3aeb46_0 conda-forge
+ openssl: 3.2.0-hd590300_1 conda-forge
+ rdma-core: 49.0-hd3aeb46_2 conda-forge
So first thing to try is probably to pin rdma-core
to 49 (trying in this PR now)
while the root environment has small differences unlikely to affect things:
@@ -31,3 +31,3 @@
conda-index 0.3.0 pyhd8ed1ab_1 conda-forge
-conda-libmamba-solver 24.1.0 pyhd8ed1ab_0 conda-forge
+conda-libmamba-solver 23.12.0 pyhd8ed1ab_0 conda-forge
conda-oci-mirror 0.1.0 pyhd8ed1ab_0 conda-forge
@@ -103,3 +103,3 @@
openjpeg 2.5.0 h488ebb8_3 conda-forge
-openssl 3.2.1 hd590300_0 conda-forge
+openssl 3.2.0 hd590300_1 conda-forge
oras-py 0.1.14 pyhd8ed1ab_0 conda-forge
@@ -132,3 +132,3 @@
python_abi 3.10 4_cp310 conda-forge
-pytz 2023.4 pyhd8ed1ab_0 conda-forge
+pytz 2023.3.post1 pyhd8ed1ab_0 conda-forge
pyyaml 6.0.1 py310h2372a71_1 conda-forge
Other differences:
ldconfig: $BUILD_DIR/_h_env/lib/ is not a symbolic link
not present in the successful build logI don't really understand what could be causing that
demoting rdma-core didn't fix it. The LIEF check is unconditional if LIEF is available, so selecting patchelf doesn't prevent the use of LIEF. The only way to prevent the failing call is to remove LIEF.
Is there a way to make LIEF unavailable in the conda-build environment on the conda-forge builds? That would allow us to actually just use patchelf.
These two PRs would (each, separately) fix the issue, I think:
The rdma-core=50
and rdma-core=49.1
builds are faulty in that ldconfig -vn "${PREFIX}/lib"
creates symlinks for libmana
/libmlx5
with those broken filenames.
The SONAME
s of the libraries look alright, though.
IDK, why it happens but I'm currently trying to replicate it.
Seems to be an issue with conda-build>=3.28
; still investigating...
Seems to be an issue with
conda-build>=3.28
; still investigating...
There is a small-ish bug in conda-build>3.28
which lets it run patchelf
not only for the actual binary but also its symlinks.
In this case we have symlinks like libibverbs/libmana-rdmav34.so -> ../libmana.so.1.0.50.0
in rdma-core
which, when used as patchelf
's import, sets the rpath to for libmana.so.1.0.50.0
to $ORIGIN/.:$ORIGIN/..
instead of $ORIGIN/.
.
This is of course wrong, but shouldn't cause too much problems.
Unfortunately, we then run into a patchelf
bug which leads to ldconfig
(called in downstream build/install scripts like here) creating additional symlinks with non-UTF-8 filenames.
I'll write an issue, test and fix for conda-build
and link it here later.
@conda-forge-admin, please rerender
Hi! This is the friendly automated conda-forge-webservice.
I tried to rerender for you, but it looks like there was nothing to do.
This message was generated by GitHub actions workflow run https://github.com/conda-forge/openmpi-feedstock/actions/runs/7966217037.
@conda-forge-admin, please restart ci
It seems @mbargull's PR https://github.com/conda/conda-build/pull/5181 was merged and available in conda-build 24.1.2, but we need to bump the pinned version in conda-forge-ci-setup to use it.
but we need to bump the pinned version in conda-forge-ci-setup to use it.
seems we need to remove this line now that conda-build uses CalVer - care to send a PR?
seems we need to remove this line now that conda-build uses CalVer - care to send a PR?
Sure, see https://github.com/conda-forge/conda-forge-ci-setup-feedstock/pull/304.
@conda-forge-admin, please restart ci
seems we need to remove this line now that conda-build uses CalVer - care to send a PR?
It seems conda-build is still pinned at 3.28.4. @h-vetinari @beckermr any idea where did I miss to modify?
I guess the pinned conda-build
might come from the docker images. Can any of you rebuild the images with the latest conda-build? It seems possible after @jakirkham's PR: https://github.com/conda-forge/docker-images/pull/230
Have a look at this: https://github.com/conda-forge/openmpi-feedstock/blob/efdc204c50e34b01b95170d54debd6d97def2d1f/.scripts/build_steps.sh#L36-L39
Modifying this to require newest conda-build gives a long resolution error:
>mamba create -n test pip mamba conda-build=24 boa conda-forge-ci-setup=4
Paring this down a bit by pinning to the last boa version (that should by all appearances not be pinned) yields:
>mamba create -n test pip mamba conda-build=24 boa=0.16 conda-forge-ci-setup=4
Looking for: ['pip', 'mamba', 'conda-build=24', 'boa=0.16', 'conda-forge-ci-setup=4']
conda-forge/win-64 Using cache
conda-forge/noarch Using cache
Could not solve for environment specs
The following packages are incompatible
├─ boa 0.16** is installable and it requires
│ └─ conda-build >=3.24,<24.1.0a0 , which can be installed;
└─ conda-build 24** is not installable because it conflicts with any installable versions previously reported.
Thus we find https://github.com/conda-forge/conda-forge-repodata-patches-feedstock/pull/657 due to https://github.com/mamba-org/boa/issues/392. There's discussion about unblocking conda-build 24 in https://github.com/conda-forge/conda-smithy/pull/1844 already, and I just opened https://github.com/conda-forge/boa-feedstock/issues/79 for an alternative approach that's slightly less intrusive (IMO).
Wouldn't we also need to rebuild rdma-core
first to produce packages with the fix for openmpi
to use?
Admittedly I've not been following this as closely as others, so that could be totally off base
@conda-forge-admin please rerender
@mbargull @h-vetinari any other thoughts? 😛 Now we are using the latest conda-build, but we still hit the same error...
Am curious what others think of my question above: https://github.com/conda-forge/openmpi-feedstock/pull/142#issuecomment-1955964849
Suspect that is an important next step
Am curious what others think of my question above: https://github.com/conda-forge/openmpi-feedstock/pull/142#issuecomment-1955964849
Based on @mbargull's earlier analysis, the CI failed because we had two unrelated bugs acting jointly:
ldconfig
that created the buggy filenames, see also here)I was hoping that with all the fixes we now eliminate Bug 1, thereby stopping the joint action of both bugs, as we are supposed to only have Bug 2 left and it should be relatively harmless. If my understanding is correct, then it means we still have a gap in this analysis.
Wouldn't we also need to rebuild
rdma-core
first to produce packages with the fix foropenmpi
to use?
AFAIU, as long as that malformed (non-UTF-8) path is there - which is still the case - the rdma rebuild shouldn't make a difference. Unless we completely get rid of the symlinks in that package, but that's not a reasonable ask; we should IMO fix (or work around) the bug in patchelf/conda-build.
@mbargull any change you have further insight on this issue? 🙂
@mbargull any change you have further insight on this issue? 🙂
Sorry, didn't follow this issue; looks like rdma-core
has not been rebuilt yet.
The libraries therein are broken for the versions built with conda-build >=3.28,<24.1.2
, see my previous comment https://github.com/conda-forge/openmpi-feedstock/pull/142#issuecomment-1936503202 :
The
rdma-core=50
andrdma-core=49.1
builds are faulty in thatldconfig -vn "${PREFIX}/lib"
creates symlinks forlibmana
/libmlx5
with those broken filenames.
So was my understanding in https://github.com/conda-forge/openmpi-feedstock/pull/142#issuecomment-1965662764 incorrect, that there are still issues to fix, regardless whether we fixed conda-build (1)?
So was my understanding in #142 (comment) incorrect, that there are still issues to fix, regardless whether we fixed conda-build (1)?
Yes, it is partially incorrect. patchelf
does not call ldconfig
.
There's no "gap in this analysis"; it's simply that the combination of bugs in conda-build
and patchelf
lead to faulty artifacts (in rdma-core
) which when processed downstream (this feedstock) with ldconfig
causes the creation of symlinks with broken names.
I have put in a PR to rebuild rdma_core
: https://github.com/conda-forge/rdma-core-feedstock/pull/17
@conda-forge-admin, please rerender
@conda-forge-admin, please rerender
Hi! This is the friendly automated conda-forge-webservice.
I tried to rerender for you, but it looks like there was nothing to do.
This message was generated by GitHub actions workflow run https://github.com/conda-forge/openmpi-feedstock/actions/runs/8234009795.
Yes, it is partially incorrect.
patchelf
does not callldconfig
. (...) which when processed downstream (this feedstock) withldconfig
causes the creation of symlinks with broken names.
@mbargull Forgive my ignorance, so when exactly is ldconfig
called and by whom?
@mbargull Forgive my ignorance, so when exactly is
ldconfig
called and by whom?
https://www.gnu.org/software/libtool/manual/libtool.html#Finish-mode
(You should find the exact invocations in a configure
scripts, but build's source archive might also carry a libtool.m4
which would be slightly more readable.)
linux-64 CI is finally happy!
@conda-forge-admin, please rerender
Checklist
0
(if the version changed)conda-smithy
(Use the phrase code>@<space/conda-forge-admin, please rerender in a comment in this PR for automated rerendering)