electronic-structure / SIRIUS

Domain specific library for electronic structure calculations
BSD 3-Clause "New" or "Revised" License
125 stars 40 forks source link

Ci/rebuild base image #942

Closed simonpintarelli closed 9 months ago

simonpintarelli commented 9 months ago

trying to get cscs pipeline rebuild the base images

toxa81 commented 9 months ago

I think we still need one build with MKL. It can be without elpa.

simonpintarelli commented 9 months ago

Here for example?

RUN spack env create -d /sirius-env-cuda && \
    spack -e /sirius-env-cuda add  "sirius@develop %gcc build_type=RelWithDebInfo +tests +apps +cuda +magma ^mpich ^intel-oneapi-mkl threads=openmp" && \
    spack -e /sirius-env-cuda develop -p /sirius-src sirius@develop && \
    spack -e /sirius-env-cuda install --only=dependencies --fail-fast

I'll wait for the current run to finish, it seems it does not cache docker layers and restarts from scratch.

toxa81 commented 9 months ago

Here for example?

RUN spack env create -d /sirius-env-cuda && \
    spack -e /sirius-env-cuda add  "sirius@develop %gcc build_type=RelWithDebInfo +tests +apps +cuda +magma ^mpich ^intel-oneapi-mkl threads=openmp" && \
    spack -e /sirius-env-cuda develop -p /sirius-src sirius@develop && \
    spack -e /sirius-env-cuda install --only=dependencies --fail-fast

I'll wait for the current run to finish, it seems it does not cache docker layers and restarts from scratch.

I pushed slighly different version with explicit scalapack.

simonpintarelli commented 9 months ago

some rocblas dependency failed with:

    280    checking whether program_invocation_name is defined... yes
     281    checking whether program_invocation_short_name is defined... yes
     282    checking for pidfd_open... no
     283    checking for __NR_pidfd_open... yes
     284    checking for ncursesw... no
     285    checking for ncurses... no
  >> 286    configure: error: ncurses support missing/incomplete (for partial b
            uild use --without-ncurses)

Seems ubuntu changed ncurses package upstream which was picked up by spack external find

simonpintarelli commented 9 months ago

cscs-ci run default

toxa81 commented 9 months ago

I don't get why it times out. I just checked the runs manually from the command line. All tests (parallel/squential) are running fine. It is for sure using ctest of the system's cmake installation (old)

simonpintarelli commented 9 months ago

@toxa81, I'm confused too, the timings on the CI per test didn't look unusually high. Is it possible jfrog is lagging when it tries to pull the image. The size is 12GB, afaik same as before.

toxa81 commented 9 months ago

oh...come on! now there is no scalapack eigen-solver in scalapack build

simonpintarelli commented 9 months ago

Sorry! I was not careful with the specs.

toxa81 commented 9 months ago

Sorry! I was not careful with the specs.

No worries, Simon! I believe it was correct. I checked the output of "build cuda image" job which builds "/sirius-env-cuda-mkl-mpich". All was set properly.

simonpintarelli commented 9 months ago

Sorry! I was not careful with the specs.

No worries, Simon! I believe it was correct. I checked the output of "build cuda image" job which builds "/sirius-env-cuda-mkl-mpich". All was set properly.

Indeed, then I broke it with the last commit. I've removed it via force push, lets hope it will pickup the prebuilt container again.

simonpintarelli commented 9 months ago

Seems the pitfall was that both build cuda image openblas (aka /sirius-env-cuda) and build cuda image (aka /sirius-env-cuda-mkl-mpich) pushed the container to jfrog. https://github.com/electronic-structure/SIRIUS/blob/ci/rebuild-base-image/ci/cscs-daint.yml#L30

Tests seems running fine now.

toxa81 commented 9 months ago

I agree that JFrog might be lagging. The failures happen randomly

toxa81 commented 9 months ago

Both hang in band parallel test for He.

toxa81 commented 9 months ago

cscs-ci run default

simonpintarelli commented 9 months ago

cscs-ci run default

toxa81 commented 9 months ago

@simonpintarelli fantastic work!