Closed simonpintarelli closed 9 months ago
I think we still need one build with MKL. It can be without elpa.
Here for example?
RUN spack env create -d /sirius-env-cuda && \
spack -e /sirius-env-cuda add "sirius@develop %gcc build_type=RelWithDebInfo +tests +apps +cuda +magma ^mpich ^intel-oneapi-mkl threads=openmp" && \
spack -e /sirius-env-cuda develop -p /sirius-src sirius@develop && \
spack -e /sirius-env-cuda install --only=dependencies --fail-fast
I'll wait for the current run to finish, it seems it does not cache docker layers and restarts from scratch.
Here for example?
RUN spack env create -d /sirius-env-cuda && \ spack -e /sirius-env-cuda add "sirius@develop %gcc build_type=RelWithDebInfo +tests +apps +cuda +magma ^mpich ^intel-oneapi-mkl threads=openmp" && \ spack -e /sirius-env-cuda develop -p /sirius-src sirius@develop && \ spack -e /sirius-env-cuda install --only=dependencies --fail-fast
I'll wait for the current run to finish, it seems it does not cache docker layers and restarts from scratch.
I pushed slighly different version with explicit scalapack.
some rocblas dependency failed with:
280 checking whether program_invocation_name is defined... yes
281 checking whether program_invocation_short_name is defined... yes
282 checking for pidfd_open... no
283 checking for __NR_pidfd_open... yes
284 checking for ncursesw... no
285 checking for ncurses... no
>> 286 configure: error: ncurses support missing/incomplete (for partial b
uild use --without-ncurses)
Seems ubuntu changed ncurses package upstream which was picked up by spack external find
cscs-ci run default
I don't get why it times out. I just checked the runs manually from the command line. All tests (parallel/squential) are running fine. It is for sure using ctest of the system's cmake installation (old)
@toxa81, I'm confused too, the timings on the CI per test didn't look unusually high. Is it possible jfrog is lagging when it tries to pull the image. The size is 12GB, afaik same as before.
oh...come on! now there is no scalapack eigen-solver in scalapack build
Sorry! I was not careful with the specs.
Sorry! I was not careful with the specs.
No worries, Simon! I believe it was correct. I checked the output of "build cuda image" job which builds "/sirius-env-cuda-mkl-mpich". All was set properly.
Sorry! I was not careful with the specs.
No worries, Simon! I believe it was correct. I checked the output of "build cuda image" job which builds "/sirius-env-cuda-mkl-mpich". All was set properly.
Indeed, then I broke it with the last commit. I've removed it via force push, lets hope it will pickup the prebuilt container again.
Seems the pitfall was that both
build cuda image openblas
(aka /sirius-env-cuda
) and build cuda image
(aka /sirius-env-cuda-mkl-mpich
) pushed the container to jfrog.
https://github.com/electronic-structure/SIRIUS/blob/ci/rebuild-base-image/ci/cscs-daint.yml#L30
Tests seems running fine now.
I agree that JFrog might be lagging. The failures happen randomly
Both hang in band parallel test for He.
cscs-ci run default
cscs-ci run default
@simonpintarelli fantastic work!
trying to get cscs pipeline rebuild the base images