InsightSoftwareConsortium / ITKRemoteModuleBuildTestPackageAction

A composite GitHub Action to build, test, and package, ITK remote modules
Apache License 2.0
3 stars 5 forks source link

Linux ARM packages are not compiled #52

Open SimonRit opened 1 year ago

SimonRit commented 1 year ago

ARM Linux modules fail (silently, which my be linked to #38). See RTK test e.g. here. The error message is

Building wheels for aarch64 using manylinux_2_28
+ sudo ldconfig
/opt/rh/gcc-toolset-11/root/usr/bin/sudo: line 41: /usr/bin/sudo: No such file or directory
Cleaning up artifacts from module build

and in the Publish Python package as GitHub Artifact:

Run actions/upload-artifact@v3
  with:
    name: LinuxWheel38
    path: dist/*.whl
    if-no-files-found: warn
Warning: No files were found with the provided path: dist/*.whl. No artifacts will be uploaded.

I probably did something wrong but that's not obvious to me what...

tbirdso commented 1 year ago

Hi @SimonRit , it looks like the GitHub runner encountered a connection issue while trying to install sudo on the aarch64 image during the run:

Status: Downloaded newer image for quay.io/pypa/manylinux_2_28_aarch64:2022-11-19-1b19e81
WARNING: The requested image's platform (linux/arm64/v8) does not match the detected host platform (linux/amd64) and no specific platform was requested
AlmaLinux 8 - BaseOS                             54 kB/s | 296 kB     00:05    
Errors during downloading metadata for repository 'baseos':
  - Status code: 404 for http://mirrors.cat.pdx.edu/alma/8.7/BaseOS/aarch64/os/repodata/92157d88cc011d93db8ee5c5f73c3983de90180863e3111794ac0e7cacbb9785-primary.xml.gz (IP: 131.252.208.20)
  - Status code: 404 for http://westus2.azure.repo.almalinux.org/almalinux/8.7/BaseOS/aarch64/os/repodata/d43f6d3a153ef11cc11bc936e10ec1aee97c0b3272921b37305c608dac295780-filelists.xml.gz (IP: 20.83.88.250)
  - Status code: 404 for http://westus2.azure.repo.almalinux.org/almalinux/8.7/BaseOS/aarch64/os/repodata/92157d88cc011d93db8ee5c5f73c3983de90180863e3111794ac0e7cacbb9785-primary.xml.gz (IP: 20.83.88.250)
  - Status code: 404 for http://eastus.azure.repo.almalinux.org/almalinux/8.7/BaseOS/aarch64/os/repodata/39fb15d25936ba34e0318ec264407348094fd096a46d95df80772cd14c6264bc-updateinfo.xml.gz (IP: 20.81.65.177)
  - Status code: 404 for http://cvo.almalinux.osuosl.org/8.7/BaseOS/aarch64/os/repodata/d43f6d3a153ef11cc11bc936e10ec1aee97c0b3272921b37305c608dac295780-filelists.xml.gz (IP: 140.211.166.134)
  - Status code: 404 for http://eastus.azure.repo.almalinux.org/almalinux/8.7/BaseOS/aarch64/os/repodata/92157d88cc011d93db8ee5c5f73c3983de90180863e3111794ac0e7cacbb9785-primary.xml.gz (IP: 20.81.65.177)
  - Status code: 404 for http://mirrors.cat.pdx.edu/alma/8.7/BaseOS/aarch64/os/repodata/d43f6d3a153ef11cc11bc936e10ec1aee97c0b3272921b37305c608dac295780-filelists.xml.gz (IP: 131.252.208.20)
  - Status code: 404 for http://westus2.azure.repo.almalinux.org/almalinux/8.7/BaseOS/aarch64/os/repodata/39fb15d25936ba34e0318ec264407348094fd096a46d95df80772cd14c6264bc-updateinfo.xml.gz (IP: 20.83.88.[250](https://github.com/RTKConsortium/RTK/actions/runs/3910081483/jobs/6681868066#step:5:251))
  - Status code: 404 for http://mirrors.cat.pdx.edu/alma/8.7/BaseOS/aarch64/os/repodata/39fb15d25936ba34e0318ec264407348094fd096a46d95df80772cd14c6264bc-updateinfo.xml.gz (IP: 131.252.208.20)
  - Status code: 404 for http://eastus.azure.repo.almalinux.org/almalinux/8.7/BaseOS/aarch64/os/repodata/d43f6d3a153ef11cc11bc936e10ec1aee97c0b3272921b37305c608dac295780-filelists.xml.gz (IP: 20.81.65.177)
  - Status code: 404 for http://dfw.mirror.rackspace.com/almalinux/8.7/BaseOS/aarch64/os/repodata/d43f6d3a153ef11cc11bc936e10ec1aee97c0b3272921b37305c608dac295780-filelists.xml.gz (IP: 74.205.112.120)
  - Status code: 404 for http://cvo.almalinux.osuosl.org/8.7/BaseOS/aarch64/os/repodata/92157d88cc011d93db8ee5c5f73c3983de90180863e3111794ac0e7cacbb9785-primary.xml.gz (IP: 140.211.166.134)
  - Status code: 404 for http://dfw.mirror.rackspace.com/almalinux/8.7/BaseOS/aarch64/os/repodata/92157d88cc011d93db8ee5c5f73c3983de90180863e3111794ac0e7cacbb9785-primary.xml.gz (IP: 74.205.112.120)
  - Status code: 404 for http://cvo.almalinux.osuosl.org/8.7/BaseOS/aarch64/os/repodata/39fb15d25936ba34e0318ec264407348094fd096a46d95df80772cd14c6264bc-updateinfo.xml.gz (IP: 140.211.166.134)
  - Status code: 404 for http://dfw.mirror.rackspace.com/almalinux/8.7/BaseOS/aarch64/os/repodata/39fb15d[259](https://github.com/RTKConsortium/RTK/actions/runs/3910081483/jobs/6681868066#step:5:260)36ba34e0318ec264407348094fd096a46d95df80772cd14c6264bc-updateinfo.xml.gz (IP: 74.205.112.120)
Error: Failed to download metadata for repo 'baseos': Yum repo downloading error: Downloading error(s): repodata/92157d88cc011d93db8ee5c5f73c3983de90180863e3111794ac0e7cacbb9785-primary.xml.gz - Cannot download, all mirrors were already tried without success; repodata/d43f6d3a153ef11cc11bc936e10ec1aee97c0b3272921b37305c608dac295780-filelists.xml.gz - Cannot download, all mirrors were already tried without success; repodata/39fb15d25936ba34e0318ec264407348094fd096a46d95df80772cd14c6264bc-updateinfo.xml.gz - Cannot download, all mirrors were already tried without success

Would you please try re-running the job and see if the build succeeds on retry?

I agree that a silent failure is not ideal. As part of https://github.com/InsightSoftwareConsortium/ITKRemoteModuleBuildTestPackageAction/issues/38 we should add a sanity check to at minimum verify that a wheel is generated in dist/.

SimonRit commented 1 year ago

Sure, I have launched it again...

SimonRit commented 1 year ago

Same result, https://github.com/RTKConsortium/RTK/actions/runs/3910081483/jobs/6691993707. But for a different reason it seems

2023-01-14T01:34:44.1371139Z c++: fatal error: Killed signal terminated program lto1
2023-01-14T01:34:44.1371471Z compilation terminated.
2023-01-14T01:34:44.1382174Z lto-wrapper: fatal error: /opt/rh/gcc-toolset-11/root/usr/bin/c++ returned 1 exit status
2023-01-14T01:34:44.1386395Z compilation terminated.
2023-01-14T01:34:44.1386924Z /opt/rh/gcc-toolset-11/root/usr/bin/ld: error: lto-wrapper failed
2023-01-14T01:34:44.1389567Z collect2: error: ld returned 1 exit status

No clue why. I'll disable ARM for the time being and keep that for later...

SimonRit commented 1 year ago

I have opened a new PR with the same result.

tbirdso commented 1 year ago

@SimonRit Thanks for re-running to reproduce the error. Unfortunately I am not familiar with the lto-wrapper issue. It seems that something is going wrong with LinkTimeOptimization (lto) but the error message doesn't leave us much to go on. Perhaps @thewtex or @jcfr might have additional thoughts here?

It would be helpful if you could attempt the following:

  1. Try configuring with CMAKE_VERBOSE_MAKEFILE:BOOL=ON to see if you can get any more details on the failure;
  2. Try stepping through the build procedure in ITKRemoteModuleBuildTestPackageAction on your local system to reproduce;
  3. The GitHub Actions approach uses ARM emulation on an x64 machine, so try building on an ARM machine and see whether the error persists. If you don't have an ARM machine readily available I've had a good experience with ARM instances on AWS EC2.

Additional notes:

  1. It looks like the Python 3.8 ARM build timed out at the GitHub runner limit of 6 hours before it could reach the lto-wrapper failure. If RTK ARM builds are consistently approaching the timeout limit then it might be worthwhile to investigate self-hosting for a faster build, preferably on an ARM machine.
  2. I will follow up in https://github.com/InsightSoftwareConsortium/ITKRemoteModuleBuildTestPackageAction/issues/38 regarding the silent failure.
SimonRit commented 1 year ago

Thanks a lot for the suggestions. I have pushed a commit for 1. I might try the rest later on but this does not have the highest priority on my side... And I don't have an ARM machine and I have never used AWS so that might not be easy. To be continued...

thewtex commented 1 year ago

Link Time Optimization is memory heavy, so we may be running out of memory. LTO can be disabled, so that may be one option for some modules.

tbirdso commented 1 year ago

@SimonRit silent failures are now addressed in https://github.com/InsightSoftwareConsortium/ITKRemoteModuleBuildTestPackageAction/commit/1b946ca387df0df7873c3be84dcd3f9d78980c42, please update the next time you push changes.