NOAA-GSL / ExascaleWorkflowSandbox

Other
2 stars 2 forks source link

Implement workaround for Flux Spack build problem #67

Closed christopherwharrop-noaa closed 6 months ago

christopherwharrop-noaa commented 6 months ago

There is a known bug in Spack/Flux that causes build of Flux core to fail with:

#14 11461.8 ==> Installing flux-core-0.58.0-6xqraerbttwveddwgcrrrhhlc2knqeny [260/265]
#14 11461.8 ==> No binary for flux-core-0.58.0-6xqraerbttwveddwgcrrrhhlc2knqeny found: installing from source
#14 11463.4 ==> Fetching https://github.com/flux-framework/flux-core/releases/download/v0.58.0/flux-core-0.58.0.tar.gz
#14 11463.4 ==> No patches needed for flux-core
#14 11463.5 ==> flux-core: Executing phase: 'autoreconf'
#14 11463.5 ==> flux-core: Executing phase: 'configure'
#14 11471.8 ==> Error: ProcessError: Command exited with status 1:
#14 11471.8     '/tmp/root/spack-stage/spack-stage-flux-core-0.58.0-6xqraerbttwveddwgcrrrhhlc2knqeny/spack-src/configure' '--prefix=/opt/software/linux-ubuntu20.04-cortex_a72/gcc-9.4.0/flux-core-0.58.0-6xqraerbttwveddwgcrrrhhlc2knqeny' '--enable-pylint=no' '--disable-docs'
#14 11472.0 
#14 11472.0 1 error found in build log:
#14 11472.0      194    checking for systemd/sd-bus.h... no
#14 11472.0      195    checking for HWLOC... yes
#14 11472.0      196    checking for LZ4... yes
#14 11472.0      197    checking for SQLITE... yes
#14 11472.0      198    checking for LIBUUID... yes
#14 11472.0      199    checking for CURSES... yes
#14 11472.0   >> 200    checking for LIBARCHIVE... configure: error: Package requirements (
#14 11472.0             libarchive) were not met:
#14 11472.0      201    
#14 11472.0      202    Package 'iconv', required by 'libarchive', not found
#14 11472.0      203    
#14 11472.0      204    Consider adjusting the PKG_CONFIG_PATH environment variable if you
#14 11472.0      205    installed software in a non-standard prefix.
#14 11472.0      206    
#14 11472.0 
#14 11472.0 See build log for details:
#14 11472.0   /tmp/root/spack-stage/spack-stage-flux-core-0.58.0-6xqraerbttwveddwgcrrrhhlc2knqeny/spack-build-out.txt

There is an open issue in Spack reporting this problem. However, neither the Spack developers nor the Flux developers know what is causing the problem. Since Flux is built last at the end of ~3.5 hours of build time, when this problem occurs it is very expensive in wasted development time. No reliable workaround has previously been identified.

This PR implements a potentially reliable workaround for this problem. We have found that if we remove all packages except for flux-core and flux-sched from the Spack-Stack container, then the Flux packages are built successfully. We don't know if this is always true, or just so far. However, this allows the Flux packages to then get pushed to the Spack binary build cache on S3. A subsequent build of the full container with all packages included is then successful because it downloads the Flux packages from the binary cache instead of building them from source.

This PR does the following:

1) Versions of Flux packages are updated to latest in attempt to minimize occurrences of bugs 2) Up-to-date versions of the Spack package.py files for the Flux packages are fetched from the authoritative Spack repository and placed into Spack-Stack's spack repository inside the container at build time. This allows 1) to work. 3) The create_dockerfile.sh script is updated to create a Dockerfile.flux-only that contains only the Flux packages. 4) The CI workflow is updated to contain an extra step that builds the Flux-only container first, causing Flux to get pushed to the binary build cache for use in the next step which builds the full container.

christopherwharrop-noaa commented 6 months ago

@NaureenBharwaniNOAA - This is ready for review