Open vsoch opened 11 months ago
I'm also trying to build (and running into issues):
include lib
root@526890c2f022:/opt/Cabana# ls
AUTHORS CMakeLists.txt LICENSE benchmark cajita core example
CHANGELOG.md CONTRIBUTING.md README.md build cmake docker
root@526890c2f022:/opt/Cabana# ls build/
CMakeCache.txt CabanaConfigVersion.cmake cmake_install.cmake
CMakeDoxyfile.in Doxyfile.doxygen core
CMakeDoxygenDefaults.cmake Makefile example
CMakeFiles _deps install
CTestTestfile.cmake bin install_manifest.txt
Cabana.pc cajita lib
CabanaConfig.cmake cmake
root@526890c2f022:/opt/Cabana# ls build/in`
> ^C
root@526890c2f022:/opt/Cabana# ls build/install
include lib
root@526890c2f022:/opt/Cabana# ls build
CMakeCache.txt CabanaConfigVersion.cmake cmake_install.cmake
CMakeDoxyfile.in Doxyfile.doxygen core
CMakeDoxygenDefaults.cmake Makefile example
CMakeFiles _deps install
CTestTestfile.cmake bin install_manifest.txt
Cabana.pc cajita lib
CabanaConfig.cmake cmake
root@526890c2f022:/opt/Cabana# ls build/install
include lib
root@526890c2f022:/opt/Cabana# echo $CABANA_INSTALL_DIR/
/opt/Cabana/build/install/
root@526890c2f022:/opt/Cabana# ls build/install/lib/
cmake libgmock.a libgmock_main.a libgtest.a libgtest_main.a pkgconfig
root@526890c2f022:/opt/Cabana# ls build/install/include/
CabanaCore_config.hpp Cajita_GlobalGrid.hpp
Cabana_AoSoA.hpp Cajita_GlobalGrid_impl.hpp
Cabana_CommunicationPlan.hpp Cajita_GlobalMesh.hpp
Cabana_Core.hpp Cajita_Halo.hpp
Cabana_DeepCopy.hpp Cajita_IndexConversion.hpp
Cabana_Distributor.hpp Cajita_IndexSpace.hpp
Cabana_ExecutionPolicy.hpp Cajita_Interpolation.hpp
Cabana_Halo.hpp Cajita_LocalGrid.hpp
Cabana_LinkedCellList.hpp Cajita_LocalGrid_impl.hpp
Cabana_MemberTypes.hpp Cajita_LocalMesh.hpp
Cabana_NeighborList.hpp Cajita_ManualPartitioner.hpp
Cabana_Parallel.hpp Cajita_MpiTraits.hpp
Cabana_ParameterPack.hpp Cajita_Parallel.hpp
Cabana_Slice.hpp Cajita_ParticleGridDistributor.hpp
Cabana_SoA.hpp Cajita_Partitioner.hpp
Cabana_Sort.hpp Cajita_ReferenceStructuredSolver.hpp
Cabana_Tuple.hpp Cajita_SparseDimPartitioner.hpp
Cabana_Types.hpp Cajita_SparseIndexSpace.hpp
Cabana_VerletList.hpp Cajita_Splines.hpp
Cabana_Version.hpp Cajita_Types.hpp
Cajita.hpp Cajita_UniformDimPartitioner.hpp
Cajita_Array.hpp gmock
Cajita_BovWriter.hpp gtest
Cajita_Config.hpp impl
root@526890c2f022:/opt/exaMPM/build# cmake -D CMAKE_BUILD_TYPE="Release" \
-D CMAKE_PREFIX_PATH=$CABANA_INSTALL_DIR \
-D CMAKE_INSTALL_PREFIX=install .. && \
make install
-- The CXX compiler identification is GNU 11.4.0
-- The C compiler identification is GNU 11.4.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Enabled Kokkos devices: OPENMP;SERIAL
-- Found MPI_CXX: /usr/lib/x86_64-linux-gnu/libmpichcxx.so (found version "4.0")
-- Found MPI: TRUE (found version "4.0") found components: CXX
-- Found CLANG_FORMAT: /usr/local/bin/clang-format (found suitable version "17.0.5", minimum required is "14")
-- Configuring done
CMake Error at src/CMakeLists.txt:20 (add_library):
Target "exampm" links to target "Cabana::Core" but the target was not
found. Perhaps a find_package() call is missing for an IMPORTED target, or
an ALIAS target is missing?
CMake Error at src/CMakeLists.txt:20 (add_library):
Target "exampm" links to target "Cabana::Grid" but the target was not
found. Perhaps a find_package() call is missing for an IMPORTED target, or
an ALIAS target is missing?
CMake Error at examples/CMakeLists.txt:1 (add_executable):
Target "DamBreak" links to target "Cabana::Core" but the target was not
found. Perhaps a find_package() call is missing for an IMPORTED target, or
an ALIAS target is missing?
CMake Error at examples/CMakeLists.txt:1 (add_executable):
Target "DamBreak" links to target "Cabana::Grid" but the target was not
found. Perhaps a find_package() call is missing for an IMPORTED target, or
an ALIAS target is missing?
CMake Error at examples/CMakeLists.txt:4 (add_executable):
Target "FreeFall" links to target "Cabana::Core" but the target was not
found. Perhaps a find_package() call is missing for an IMPORTED target, or
an ALIAS target is missing?
CMake Error at examples/CMakeLists.txt:4 (add_executable):
Target "FreeFall" links to target "Cabana::Grid" but the target was not
found. Perhaps a find_package() call is missing for an IMPORTED target, or
an ALIAS target is missing?
-- Generating done
CMake Generate step failed. Build files cannot be regenerated correctly.
My cabana install directory:
# ls $CABANA_INSTALL_DIR/
include lib
root@526890c2f022:/opt/Cabana# ls
AUTHORS CMakeLists.txt LICENSE benchmark cajita core example
CHANGELOG.md CONTRIBUTING.md README.md build cmake docker
root@526890c2f022:/opt/Cabana# ls build/
CMakeCache.txt CabanaConfigVersion.cmake cmake_install.cmake
CMakeDoxyfile.in Doxyfile.doxygen core
CMakeDoxygenDefaults.cmake Makefile example
CMakeFiles _deps install
CTestTestfile.cmake bin install_manifest.txt
Cabana.pc cajita lib
CabanaConfig.cmake cmake
root@526890c2f022:/opt/Cabana# ls build/install
include lib
root@526890c2f022:/opt/Cabana# ls build
CMakeCache.txt CabanaConfigVersion.cmake cmake_install.cmake
CMakeDoxyfile.in Doxyfile.doxygen core
CMakeDoxygenDefaults.cmake Makefile example
CMakeFiles _deps install
CTestTestfile.cmake bin install_manifest.txt
Cabana.pc cajita lib
CabanaConfig.cmake cmake
root@526890c2f022:/opt/Cabana# ls build/install
include lib
root@526890c2f022:/opt/Cabana# echo $CABANA_INSTALL_DIR/
/opt/Cabana/build/install/
root@526890c2f022:/opt/Cabana# ls build/install/lib/
cmake libgmock.a libgmock_main.a libgtest.a libgtest_main.a pkgconfig
root@526890c2f022:/opt/Cabana# ls build/install/include/
CabanaCore_config.hpp Cajita_GlobalGrid.hpp
Cabana_AoSoA.hpp Cajita_GlobalGrid_impl.hpp
Cabana_CommunicationPlan.hpp Cajita_GlobalMesh.hpp
Cabana_Core.hpp Cajita_Halo.hpp
Cabana_DeepCopy.hpp Cajita_IndexConversion.hpp
Cabana_Distributor.hpp Cajita_IndexSpace.hpp
Cabana_ExecutionPolicy.hpp Cajita_Interpolation.hpp
Cabana_Halo.hpp Cajita_LocalGrid.hpp
Cabana_LinkedCellList.hpp Cajita_LocalGrid_impl.hpp
Cabana_MemberTypes.hpp Cajita_LocalMesh.hpp
Cabana_NeighborList.hpp Cajita_ManualPartitioner.hpp
Cabana_Parallel.hpp Cajita_MpiTraits.hpp
Cabana_ParameterPack.hpp Cajita_Parallel.hpp
Cabana_Slice.hpp Cajita_ParticleGridDistributor.hpp
Cabana_SoA.hpp Cajita_Partitioner.hpp
Cabana_Sort.hpp Cajita_ReferenceStructuredSolver.hpp
Cabana_Tuple.hpp Cajita_SparseDimPartitioner.hpp
Cabana_Types.hpp Cajita_SparseIndexSpace.hpp
Cabana_VerletList.hpp Cajita_Splines.hpp
Cabana_Version.hpp Cajita_Types.hpp
Cajita.hpp Cajita_UniformDimPartitioner.hpp
Cajita_Array.hpp gmock
Cajita_BovWriter.hpp gtest
Cajita_Config.hpp impl
Did I forget to build something?
For the build issues, you just need a newer version of Cabana (0.6.1 or up to date master). I just opened a PR to give a clear error at configuration
For the run, I went ahead and updated the wiki page - exactly as you guessed, you can just add mpirun
Thank you! A quick follow up question (I'm not great at debugging MPI). I can confirm that I can ping the other host and can ssh into it from my launcher, but I'm getting an error. Here are details:
Here are my hosts
# cat hostlist.txt
metricset-sample-l-0-0.ms.default.svc.cluster.local
metricset-sample-w-0-0.ms.default.svc.cluster.local
Ping works to the worker (w) node
# ping metricset-sample-w-0-0.ms.default.svc.cluster.local
PING metricset-sample-w-0-0.ms.default.svc.cluster.local (10.244.0.16) 56(84) bytes of data.
64 bytes from metricset-sample-w-0-0.ms.default.svc.cluster.local (10.244.0.16): icmp_seq=1 ttl=63 time=0.097 ms
64 bytes from metricset-sample-w-0-0.ms.default.svc.cluster.local (10.244.0.16): icmp_seq=2 ttl=63 time=0.058 ms
64 bytes from metricset-sample-w-0-0.ms.default.svc.cluster.local (10.244.0.16): icmp_seq=3 ttl=63 time=0.050 ms
^C
mpirun spits out this error
# mpirun --hostfile ./hostlist.txt --allow-run-as-root -N 2 ./DamBreak 0.05 2 0 0.001 1.0 50 serial
ssh: Could not resolve hostname metricset-sample-w-0-0: Temporary failure in name resolution
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:
* not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default
* lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.
* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to use.
* compilation of the orted with dynamic libraries when static are required
(e.g., on Cray). Please check your configure cmd line and consider using
one of the contrib/platform definitions for your system type.
* an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).
--------------------------------------------------------------------------
ssh to the other host works too!
root@metricset-sample-l-0-0:/opt/exaMPM/build/examples# ssh metricset-sample-w-0-0.ms.default.svc.cluster.local
Welcome to Ubuntu 22.04.3 LTS (GNU/Linux 6.2.0-37-generic x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
This system has been minimized by removing packages and content that are
not required on a system that users do not log into.
To restore this content, you can run the 'unminimize' command.
The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.
The MPI I'm using (maybe the wrong one or version?) Thanks for the help!
mpirun (Open MPI) 4.1.2
Usage: mpirun [OPTION]... [PROGRAM]...
Start the given program using Open RTE
Thanks for your help!
Looks to be an MPI configuration issue (that I don't think I can help with), but I can at least confirm that the version of MPI is something we test against regularly
This is probably my stopping point for working on it then - I'm not sure what the problem above is (and I'm still inexperienced with MPI). For context I was going to add it to the metrics operator https://github.com/converged-computing/metrics-operator and use for converged computing experiments on Kubernetes, but I'll skip over it and move on to the next. Thanks!
Now that I see you have a lammps case, maybe using exactly the same MPI call as what they use would make a difference here? Just a thought
Thanks for the suggestion! The lammps metrics container uses mpich and the one here is openmpi, so I don't think we could do that. I did try mpich too (with the same command) and got a non-working result.
Hi! I'm new to using this app, and was wondering if you have an example for running with mpirun (or similar?) I'm looking at the docs here: https://github.com/ECP-copa/ExaMPM/wiki/Run
Thank you! And apologies if this is an overly simple question (e.g., just put mpirun in front of that :P )