NOAA-EMC / UPP

Other
36 stars 98 forks source link

intel build not working in CI #995

Closed edwardhartnett closed 2 months ago

edwardhartnett commented 3 months ago

Ends like this:

100%] Built target read_nemsio.x
make[1]: Leaving directory '/tmp/runner/spack-stage/spack-stage-nemsio-2.5.4-ktf27re5vtwyio7c3bv2vmxrhunmrzuy/spack-build-ktf27re'
/usr/local/bin/cmake -E cmake_progress_start /tmp/runner/spack-stage/spack-stage-nemsio-2.5.4-ktf27re5vtwyio7c3bv2vmxrhunmrzuy/spack-build-ktf27re/CMakeFiles 0
make  -f CMakeFiles/Makefile2 preinstall
make[1]: Entering directory '/tmp/runner/spack-stage/spack-stage-nemsio-2.5.4-ktf27re5vtwyio7c3bv2vmxrhunmrzuy/spack-build-ktf27re'
make[1]: Nothing to be done for 'preinstall'.
make[1]: Leaving directory '/tmp/runner/spack-stage/spack-stage-nemsio-2.5.4-ktf27re5vtwyio7c3bv2vmxrhunmrzuy/spack-build-ktf27re'
Install the project...
/usr/local/bin/cmake -P cmake_install.cmake
-- Install configuration: "Release"
-- Installing: /home/runner/work/UPP/UPP/spack/opt/spack/linux-ubuntu20.04-zen2/intel-2021.10.0/nemsio-2.5.4-ktf27re5vtwyio7c3bv2vmxrhunmrzuy/include
-- Installing: /home/runner/work/UPP/UPP/spack/opt/spack/linux-ubuntu20.04-zen2/intel-2021.10.0/nemsio-2.5.4-ktf27re5vtwyio7c3bv2vmxrhunmrzuy/include/nemsio_write.mod
-- Installing: /home/runner/work/UPP/UPP/spack/opt/spack/linux-ubuntu20.04-zen2/intel-2021.10.0/nemsio-2.5.4-ktf27re5vtwyio7c3bv2vmxrhunmrzuy/include/nemsio_module_mpi.mod
-- Installing: /home/runner/work/UPP/UPP/spack/opt/spack/linux-ubuntu20.04-zen2/intel-2021.10.0/nemsio-2.5.4-ktf27re5vtwyio7c3bv2vmxrhunmrzuy/include/nemsio_module.mod

Not sure what is happening.

@AlexanderRichert-NOAA can you take a look?

WenMeng-NOAA commented 3 months ago

@edwardhartnett Is there a way to turn off Intel CI before you find a solution to solve this false alert?

edwardhartnett commented 3 months ago

No, instead we want to fix it.

AlexanderRichert-NOAA commented 3 months ago

For the last several Intel runs, I'm seeing "No space left on device" errors in the workflow summaries, e.g., https://github.com/NOAA-EMC/UPP/actions/runs/9909411518 It appears the logs are filling up: System.IO.IOException: No space left on device : '/home/runner/runners/2.317.0/_diag/Worker_20240712-142126-utc.log' I would suggest turning off the verbose (-v) option for spack install, as there are other ways to obtain that information (I would suggest either saving off various log files as artifacts, or perhaps use the --show-log-on-error option for spack install so that detailed logs aren't spit out for the packages that install correctly).

edwardhartnett commented 3 months ago

@AlexanderRichert-NOAA thanks, I should have checked the action page to see the out of space error. However, now I am checking it and it's happening even after I removed the -v (which does drastically reduce log size).

Is there any way to get spack to use less space? Perhaps by doing the clean after each build instead of at the end?

AlexanderRichert-NOAA commented 3 months ago

I think the placement of the cache clean makes sense... One thing I'm noticing at the moment is that Spack is installing intel-oneapi-mpi, even though we've already installed that through apt. I think we can safely remove intel-oneapi-dev-utilities and intel-oneapi-mpi-devel from what gets installed by apt.

I'll see if I can find other ways to reduce the disk usage.

edwardhartnett commented 3 months ago

I will take those out of the apt install...

edwardhartnett commented 3 months ago

Taking out the apt install did not work:

38    f951: Fatal Error: Reading module ‘/home/runner/work/UPP/UPP/spack/o
       pt/spack/linux-ubuntu20.04-zen2/gcc-10.5.0/intel-oneapi-mpi-2021.12.
       1-ydoxetm3lceho5ctesx2d6pcds63cgpw/mpi/2021.12/include/mpi/mpi.mod’ 
       at line 1 column 2: Unexpected EOF
 39    compilation terminated.

40 make[2]: [src/CMakeFiles/nemsio.dir/build.make:130: src/CMakeFil es/nemsio.dir/nemsio_module_mpi.f90.o] Error 1 41 make[2]: Waiting for unfinished jobs.... 42 make[2]: Leaving directory '/tmp/runner/spack-stage/spack-stage-nems io-2.5.4-tx726tqt6lk7vxcztcdmjks2xydcxwal/spack-build-tx726tq' 43 make[1]: [CMakeFiles/Makefile2:917: src/CMakeFiles/nemsio.dir/al l] Error 2 44 make[1]: Leaving directory '/tmp/runner/spack-stage/spack-stage-nems io-2.5.4-tx726tqt6lk7vxcztcdmjks2xydcxwal/spack-build-tx726tq' 45 make: [Makefile:149: all] Error 2

AlexanderRichert-NOAA commented 3 months ago

That's weird, it appears to be building everything with gcc in that workflow... Can you try adding spack config add "packages:all:prefer:['%intel']" somewhere before the concretization step?

edwardhartnett commented 2 months ago

I've done that.

It's now failing like this:

f951: Fatal Error: Reading module ‘/home/runner/work/UPP/UPP/spack/o
           pt/spack/linux-ubuntu20.04-zen2/gcc-10.5.0/intel-oneapi-mpi-2021.12.
           1-ydoxetm3lceho5ctesx2d6pcds63cgpw/mpi/2021.12/include/mpi/mpi.mod’ 
           at line 1 column 2: Unexpected EOF

Are we sure that mpich is included in the intel environment?

AlexanderRichert-NOAA commented 2 months ago

To be clear, do you want to use Intel MPI or MPICH to provide MPI?

edwardhartnett commented 2 months ago

Intel

edwardhartnett commented 2 months ago

Since there are no plans for unit testing, there is no need to spend more time maintaining the CI build. It doesn't have any tests to run anyway. ;-)

I will close this issue.