Open cameronrutherford opened 11 months ago
are you somehow building hiop without MPI? or different mpi headers are with hiop and petsc
are you somehow building hiop without MPI? or different mpi headers are with hiop and petsc
From the ExaGO pipeline, we are building:
exago@develop+hiop~ipopt~mpi~python+raja+tests arch=None-None-x86_64
- tx7nd5d exago@develop%gcc@9.4.0~cuda+hiop~ipo~ipopt+logging~mpi~python+raja~rocm+tests build_system=cmake build_type=RelWithDebInfo dev_path=/__w/ExaGO/ExaGO arch=linux-ubuntu20.04-x86_64
- ybikngp ^camp@0.2.3%gcc@9.4.0~cuda~ipo+openmp~rocm~tests build_system=cmake build_type=RelWithDebInfo arch=linux-ubuntu20.04-x86_64
- kl43gwj ^blt@0.4.1%gcc@9.4.0 build_system=generic arch=linux-ubuntu20.04-x86_64
- 7bzaewm ^cmake@3.25.2%gcc@9.4.0~doc+ncurses+ownlibs~qt build_system=generic build_type=Release arch=linux-ubuntu20.04-x86_64
- 3bxcabf ^ncurses@6.4%gcc@9.4.0~symlinks+termlib abi=none build_system=autotools arch=linux-ubuntu20.04-x86_64
- yekhgie ^openssl@1.1.1t%gcc@9.4.0~docs~shared build_system=generic certs=mozilla arch=linux-ubuntu20.04-x86_64
- djeruao ^ca-certificates-mozilla@2023-01-10%gcc@9.4.0 build_system=generic arch=linux-ubuntu20.04-x86_64
- jymwj6w ^zlib@1.2.13%gcc@9.4.0+optimize+pic+shared build_system=makefile arch=linux-ubuntu20.04-x86_64
- vcsqn5o ^hiop@0.7.1%gcc@9.4.0~cuda+deepchecking~ginkgo~ipo~jsrun~kron~mpi+raja~rocm~shared~sparse build_system=cmake build_type=RelWithDebInfo arch=linux-ubuntu20.04-x86_64
- wtvhbiz ^openblas@0.3.21%gcc@9.4.0~bignuma~consistent_fpcsr+fortran~ilp64+locking+pic+shared build_system=makefile patches=114f95f,a4c642f,c20f518,d3d9b15 symbol_suffix=none threads=none arch=linux-ubuntu20.04-x86_64
- 5qydzbx ^perl@5.36.0%gcc@9.4.0+cpanm+open+shared+threads build_system=generic arch=linux-ubuntu20.04-x86_64
- e5g7oef ^berkeley-db@18.1.40%gcc@9.4.0+cxx~docs+stl build_system=autotools patches=26090f4,b231fcc arch=linux-ubuntu20.04-x86_64
- gs4r33x ^bzip2@1.0.8%gcc@9.4.0~debug~pic+shared build_system=generic arch=linux-ubuntu20.04-x86_64
- 7wdyruu ^gdbm@1.23%gcc@9.4.0 build_system=autotools arch=linux-ubuntu20.04-x86_64
- wslvyrk ^petsc@3.18.3%gcc@9.4.0~X~batch~cgns~complex~cuda~debug+double~exodusii~fftw+fortran~giflib~hdf5~hpddm~hwloc~hypre~int64~jpeg~knl~kokkos~libpng~libyaml~memkind+metis~mkl-pardiso~mmg~moab~mpfr~mpi~mumps~openmp~p4est~parmmg~ptscotch~random123~rocm~saws~scalapack+shared~strumpack~suite-sparse~superlu-dist~tetgen~trilinos~valgrind build_system=generic clanguage=C arch=linux-ubuntu20.04-x86_64
- kwz7ftm ^diffutils@3.8%gcc@9.4.0 build_system=autotools arch=linux-ubuntu20.04-x86_64
- y4xrp3s ^libiconv@1.17%gcc@9.4.0 build_system=autotools libs=shared,static arch=linux-ubuntu20.04-x86_64
- wnqabk7 ^metis@5.1.0%gcc@9.4.0~gdb~int64~ipo~real64+shared build_system=cmake build_type=RelWithDebInfo patches=4991da9,93a7903,b1225da arch=linux-ubuntu20.04-x86_64
- quyjgw3 ^python@3.10.8%gcc@9.4.0+bz2+crypt+ctypes+dbm~debug+libxml2+lzma~nis~optimizations+pic+pyexpat+pythoncmd+readline+shared+sqlite3+ssl~tkinter+uuid+zlib build_system=generic patches=0d98e93,7d40923,f2fd060 arch=linux-ubuntu20.04-x86_64
- pgvwni4 ^expat@2.5.0%gcc@9.4.0+libbsd build_system=autotools arch=linux-ubuntu20.04-x86_64
- en3zuay ^libbsd@0.11.7%gcc@9.4.0 build_system=autotools arch=linux-ubuntu20.04-x86_64
- ps7sxlx ^libmd@1.0.4%gcc@9.4.0 build_system=autotools arch=linux-ubuntu20.04-x86_64
- wlq5rko ^gettext@0.21.1%gcc@9.4.0+bzip2+curses+git~libunistring+libxml2+tar+xz build_system=autotools arch=linux-ubuntu20.04-x86_64
- j6aqcps ^libxml2@2.10.3%gcc@9.4.0~python build_system=autotools arch=linux-ubuntu20.04-x86_64
- zt4ocio ^tar@1.34%gcc@9.4.0 build_system=autotools zip=pigz arch=linux-ubuntu20.04-x86_64
- xoxeujp ^pigz@2.7%gcc@9.4.0 build_system=makefile arch=linux-ubuntu20.04-x86_64
- 3vtuapf ^zstd@1.5.2%gcc@9.4.0+programs build_system=makefile compression=none libs=shared,static arch=linux-ubuntu20.04-x86_64
- 6sswith ^libffi@3.4.4%gcc@9.4.0 build_system=autotools arch=linux-ubuntu20.04-x86_64
- 2evlwmd ^libxcrypt@4.4.33%gcc@9.4.0~obsolete_api build_system=autotools arch=linux-ubuntu20.04-x86_64
- iuswzm4 ^readline@8.2%gcc@9.4.0 build_system=autotools patches=bbf97f1 arch=linux-ubuntu20.04-x86_64
- ghcuaen ^sqlite@3.40.1%gcc@9.4.0+column_metadata+dynamic_extensions+fts~functions+rtree build_system=autotools arch=linux-ubuntu20.04-x86_64
- swhrnzy ^util-linux-uuid@2.38.1%gcc@9.4.0 build_system=autotools arch=linux-ubuntu20.04-x86_64
- qkxtzoa ^xz@5.4.1%gcc@9.4.0~pic build_system=autotools libs=shared,static arch=linux-ubuntu20.04-x86_64
- w6opye6 ^pkgconf@1.8.0%gcc@9.4.0 build_system=autotools arch=linux-ubuntu20.04-x86_64
- 7xqyl5b ^raja@0.14.0%gcc@9.4.0~cuda+examples+exercises~ipo+openmp~rocm+shared~tests build_system=cmake build_type=RelWithDebInfo arch=linux-ubuntu20.04-x86_64
- 4s36yj3 ^umpire@6.0.0%gcc@9.4.0+c~cuda~device_alloc~deviceconst+examples~fortran~ipo~numa~openmp~rocm+shared build_system=cmake build_type=RelWithDebInfo tests=none arch=linux-ubuntu20.04-x86_64
And so we get the backtrace:
459 In file included from /__w/ExaGO/ExaGO/tpl/spack/opt/spack/linux-ub
untu20.04-x86_64/gcc-9.4.0/hiop-0.7.1-vcsqn5ocwcwfihlrbjqozv3ku2rkg
zo7/include/hiopInterface.hpp:60,
460 from /__w/ExaGO/ExaGO/tpl/spack/opt/spack/linux-ub
untu20.04-x86_64/gcc-9.4.0/hiop-0.7.1-vcsqn5ocwcwfihlrbjqozv3ku2rkg
zo7/include/hiopNlpFormulation.hpp:59,
461 from /__w/ExaGO/ExaGO/tpl/spack/opt/spack/linux-ub
untu20.04-x86_64/gcc-9.4.0/hiop-0.7.1-vcsqn5ocwcwfihlrbjqozv3ku2rkg
zo7/include/hiopAlgFilterIPM.hpp:59,
462 from /__w/ExaGO/ExaGO/src/opflow/solver/hiop/opflo
w_hiop.h:7,
463 from /__w/ExaGO/ExaGO/src/opflow/solver/hiop/opflo
w_hiop.cpp:4:
>> 464 /__w/ExaGO/ExaGO/tpl/spack/opt/spack/linux-ubuntu20.04-x86_64/gcc-9
.4.0/petsc-3.18.3-wslvyrkkwofiig24a5rm7gctadb7g4fk/include/petsc/mp
iuni/mpi.h:186:13: error: multiple types in one declaration
465 186 | typedef int MPI_Comm;
466 | ^~~~~~~~
>> 467 /__w/ExaGO/ExaGO/tpl/spack/opt/spack/linux-ubuntu20.04-x86_64/gcc-9
.4.0/petsc-3.18.3-wslvyrkkwofiig24a5rm7gctadb7g4fk/include/petsc/mp
iuni/mpi.h:186:13: error: declaration does not declare anything [-f
permissive]
>> 468 make[2]: *** [src/opflow/CMakeFiles/OPFLOW_obj_static.dir/build.mak
e:261: src/opflow/CMakeFiles/OPFLOW_obj_static.dir/solver/hiop/opfl
ow_hiop.cpp.o] Error 1
So the HiOp header hiopInterface.hpp
on line 60 (linked in the issue description originally) is including hiopMPI.h
, which is then including mpi.h
. This looks for any header, and picks up a random PETSc one which errors out.
We are building PETSc and HiOp without MPI here, so I honestly think this could be a HiOp and a PETSc bug?
@cnpetra @cameronrutherford
I can successfully build HiOp without MPI.
In hiopMPI.h
, mpi.h
is not included if we set HIOP_USE_MPI = OFF
.
From your log file, I think the problems are:
HIOP_USE_MPI = OFF
, both HiOp and PETSc define their own MPI_Comm
. mpi.h
is included. Seems to be it is included via PETSc.See here
@cnpetra @cameronrutherford
I can successfully build HiOp without MPI.
In
hiopMPI.h
,mpi.h
is not included if we setHIOP_USE_MPI = OFF
.From your log file, I think the problems are:
When
HIOP_USE_MPI = OFF
, both HiOp and PETSc define their ownMPI_Comm
.Not sure where
mpi.h
is included. Seems to be it is included via PETSc.See here
I'm following, but some clarification. I am also able to build hiop~mpi
, but issue only happens when exago~mpi
tries to build with both petsc~mpi
and hiop~mpi
.
Why do HiOp and PETSc both need to define MPI_Comm
in these non-mpi builds?
Again this might technically be an ExaGO (or PETSc or HiOp) issue, but trying to figure out who's to blame here
we had this issue before with mfem if I recall correctly. One the defines has to go. I think HiOp can take with however petsc defines MPI_Comm
. So an easy fix would be for HiOp to check if already defined. This is for when HIOP_USE_MPI is off.
https://github.com/LLNL/hiop/blob/develop/src/Interface/hiopInterface.hpp#L60
https://github.com/pnnl/ExaGO/actions/runs/6304772661/job/17116842345?pr=15
This is a really weird bug, as even when building
petsc~mpi
in theexago
package here,petsc
insists on having anmpi.h
lying around that is also picked up...I am still trying to figure out who to blame here, but this seemed like the right place to start.