MPAS-Dev / compass

Configuration Of MPAS Setups
Other
10 stars 37 forks source link

Add pm-gpu #835

Open xylar opened 2 months ago

xylar commented 2 months ago

Checklist

xylar commented 2 months ago

@mcarlson801 and @jewatkins, this is the starting point for adding pm-gpu support (with gnugpu for now, and maybe nvidiagpu to follow).

I have been able to run the full_integration suite from MALI on pm-gpu but I suspect it is probably just running on the CPU portion of each node for now. Along with @matthewhoffman, we should have a discussion about what modification are needed to build MALI and/or run the job to make sure Compass takes advantage of GPUs. We likely will also need a way to detect the GPU resources available and to specify the resource needs of each test. This isn't something I have thought about very much but it is also needed very, very soon for the Omega ocean model in Polaris, the successor to Compass.

xylar commented 2 months ago

A note to say that I tried to build Albany and Trilinos with nvidiagpu and got a bunch of errors. So I'm not listing that as a supported config.

xylar commented 3 weeks ago

@mcarlson801 and @jewatkins, I was able to build the trilinos and albany spack libraries (and the rest of the compass spack environment) using this branch, https://github.com/xylar/mache/tree/add-cuda-to-pm-gpu and @mcarlson801's https://github.com/E3SM-Project/spack/pull/31.

I was also able to build MALI from the MALI-Dev submodule.

I ran the full_integration test suite and it was much slower than usual -- a 1-hour job timed out. This may be related to issues with the $SCRATCH drive, because I was also having trouble with basic operations there. So might be worth testing again later. But I also saw errors in several tests, mostly restart tests but also a decomp test, which I will report once I get the job to run again.

xylar commented 3 weeks ago

The following restart tests failed with a validation error:

landice/dome/2000m/fo_restart_test
landice/dome/variable_resolution/fo_restart_tes
landice/greenland/fo_restart_test

The errors are all quantitatively similar to the following:

thickness            Time index: 0, 1, 2
1:  l1: 1.03182219712838e-12  l2: 2.85855338539842e-13  linf: 1.13686837721616e-13
2:  l1: 1.31200952879773e-12  l2: 3.29681925916544e-13  linf: 1.13686837721616e-13
  FAIL /pscratch/sd/x/xylar/compass_1.4/pm-cpu/test_20240813/full_integ_gnugpu/landice/dome/2000m/fo_restart_test/full_run/output.nc
       /pscratch/sd/x/xylar/compass_1.4/pm-cpu/test_20240813/full_integ_gnugpu/landice/dome/2000m/fo_restart_test/restart_run/output.nc
normalVelocity       Time index: 0, 1, 2
0:  l1: 2.42945560944798e-18  l2: 3.23431732743427e-20  linf: 2.11758236813575e-21
1:  l1: 2.66684010607556e-18  l2: 3.63297450051263e-20  linf: 1.90582413132218e-21
2:  l1: 2.68145536927484e-18  l2: 3.58541371140577e-20  linf: 2.11758236813575e-21
  FAIL /pscratch/sd/x/xylar/compass_1.4/pm-cpu/test_20240813/full_integ_gnugpu/landice/dome/2000m/fo_restart_test/full_run/output.nc
       /pscratch/sd/x/xylar/compass_1.4/pm-cpu/test_20240813/full_integ_gnugpu/landice/dome/2000m/fo_restart_test/restart_run/output.nc
Internal test case validation failed
xylar commented 3 weeks ago

The landice/circular_shelf/decomposition_test fails in the 1proc_run step with the following stack trace from Albany:

:0: : block: [10,0,0], thread: [0,92,0] Assertion `Allocation failed.` failed.
:0: : block: [10,0,0], thread: [0,93,0] Assertion `Allocation failed.` failed.
:0: : block: [26,0,0], thread: [0,29,0] Assertion `Allocation failed.` failed.
:0: : block: [6,0,0], thread: [0,125,0] Assertion `Allocation failed.` failed.
:0: : block: [82,0,0], thread: [0,29,0] Assertion `Allocation failed.` failed.
:0: : block: [79,0,0], thread: [0,125,0] Assertion `Allocation failed.` failed.
:0: : block: [72,0,0], thread: [0,125,0] Assertion `Allocation failed.` failed.
:0: : block: [62,0,0], thread: [0,92,0] Assertion `Allocation failed.` failed.
:0: : block: [75,0,0], thread: [0,29,0] Assertion `Allocation failed.` failed.
:0: : block: [7,0,0], thread: [0,61,0] Assertion `Allocation failed.` failed.
:0: : block: [41,0,0], thread: [0,93,0] Assertion `Allocation failed.` failed.
(ptr->cuda_stream_synchronize_wrapper(stream)) error( cudaErrorAssert): device-side assert triggered /pscratch/sd/x/xylar/spack_gpu_tmp/spack-stage/spack-stage-trilinos-for-albany-compass-2024-03-13-wlto53yjmwkx6vak3n7ssh2tho4n2n6f/spack-src/packages/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp:166
Backtrace:
[0x7f97ad196e35] Kokkos::Impl::save_stacktrace()
[0x7f97ad16a58c] Kokkos::Impl::host_abort(char const*)
[0x7f97ad19e54e] Kokkos::Impl::cuda_internal_error_abort(cudaError, char const*, char const*, int)
[0x7f97ad19e80a] Kokkos::Impl::cuda_stream_synchronize(CUstream_st*, Kokkos::Impl::CudaInternal const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
[0x7f97f1f5bba8] PHX::DagManager<PHAL::AlbanyTraits>::evaluateFields(PHAL::Workset&)
[0x7f97e2c260de] Albany::Application::computeGlobalJacobianImpl(double, double, double, double, Teuchos::RCP<Thyra::VectorBase<double> const> const&, Teuchos::RCP<Thyra::VectorBase<double> const> const&, Teuchos::RCP<Thyra::VectorBase<double> const> const&, Teuchos::Array<Sacado::ScalarParameterVector<SPL_Traits> > const&, Teuchos::RCP<Thyra::VectorBase<double> > const&, Teuchos::RCP<Thyra::LinearOpBase<double> > const&, double)
[0x7f97e2c276e3] Albany::Application::computeGlobalJacobian(double, double, double, double, Teuchos::RCP<Thyra::VectorBase<double> const> const&, Teuchos::RCP<Thyra::VectorBase<double> const> const&, Teuchos::RCP<Thyra::VectorBase<double> const> const&, Teuchos::Array<Sacado::ScalarParameterVector<SPL_Traits> > const&, Teuchos::RCP<Thyra::VectorBase<double> > const&, Teuchos::RCP<Thyra::LinearOpBase<double> > const&, double)
[0x7f97e2e2632a] Albany::ModelEvaluator::evalModelImpl(Thyra::ModelEvaluatorBase::InArgs<double> const&, Thyra::ModelEvaluatorBase::OutArgs<double> const&) const
[0x7f97e2b4c878] Thyra::ModelEvaluatorDefaultBase<double>::evalModel(Thyra::ModelEvaluatorBase::InArgs<double> const&, Thyra::ModelEvaluatorBase::OutArgs<double> const&) const
[0x7f97e2bcd96d] Thyra::DefaultModelEvaluatorWithSolveFactory<double>::evalModelImpl(Thyra::ModelEvaluatorBase::InArgs<double> const&, Thyra::ModelEvaluatorBase::OutArgs<double> const&) const
[0x7f97e2b4c878] Thyra::ModelEvaluatorDefaultBase<double>::evalModel(Thyra::ModelEvaluatorBase::InArgs<double> const&, Thyra::ModelEvaluatorBase::OutArgs<double> const&) const
[0x7f97dbb90f25] NOX::Thyra::Group::computeJacobian()
[0x7f97dbaed262] NOX::Direction::Newton::compute(NOX::Abstract::Vector&, NOX::Abstract::Group&, NOX::Solver::Generic const&)
[0x7f97dbb09c48] NOX::Solver::LineSearchBased::step()
[0x7f97dbb0bb39] NOX::Solver::LineSearchBased::solve()
[0x7f97dbba66a6] Thyra::NOXNonlinearSolver::solve(Thyra::VectorBase<double>*, Thyra::SolveCriteria<double> const*, Thyra::VectorBase<double>*)
[0x7f97e198a14f] Piro::NOXSolver<double>::evalModelImpl(Thyra::ModelEvaluatorBase::InArgs<double> const&, Thyra::ModelEvaluatorBase::OutArgs<double> const&) const
[0x7f97e2b4c878] Thyra::ModelEvaluatorDefaultBase<double>::evalModel(Thyra::ModelEvaluatorBase::InArgs<double> const&, Thyra::ModelEvaluatorBase::OutArgs<double> const&) const
[0x7f97f1b3d385] void Piro::Detail::PerformSolveImpl<double, Thyra::VectorBase<double> const, Thyra::MultiVectorBase<double> const>(Thyra::ModelEvaluator<double> const&, Teuchos::Array<bool> const&, bool, Teuchos::Array<Teuchos::RCP<Thyra::VectorBase<double> const> >&, Teuchos::Array<Teuchos::Array<Teuchos::RCP<Thyra::MultiVectorBase<double> const> > >&, Teuchos::Array<Teuchos::RCP<Thyra::MultiVectorBase<double> const> >&, Teuchos::Array<Teuchos::Array<Teuchos::RCP<Thyra::MultiVectorBase<double> const> > >&, Teuchos::RCP<Piro::SolutionObserverBase<double, Thyra::VectorBase<double> const> >)
[0x7f97f1b3dd9d] void Piro::Detail::PerformSolveImpl<double, Thyra::VectorBase<double> const, Thyra::MultiVectorBase<double> const>(Thyra::ModelEvaluator<double> const&, Teuchos::ParameterList&, Teuchos::Array<Teuchos::RCP<Thyra::VectorBase<double> const> >&, Teuchos::Array<Teuchos::Array<Teuchos::RCP<Thyra::MultiVectorBase<double> const> > >&)
[0x7f97f1b0dc63] velocity_solver_solve_fo__(int, int, int, bool, bool, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, double, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> >&, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> >&, std::vector<double, std::allocator<double> >&, std::vector<double, std::allocator<double> >&, int&, double const&)
xylar commented 3 weeks ago

Here's the timing I'm seeing:

Test Runtimes:
00:47 PASS landice_dome_2000m_sia_restart_test
00:11 PASS landice_dome_2000m_sia_decomposition_test
00:19 PASS landice_dome_variable_resolution_sia_restart_test
00:07 PASS landice_dome_variable_resolution_sia_decomposition_test
00:14 PASS landice_enthalpy_benchmark_A
00:17 PASS landice_eismint2_decomposition_test
00:14 PASS landice_eismint2_enthalpy_decomposition_test
00:31 PASS landice_eismint2_restart_test
01:30 PASS landice_eismint2_enthalpy_restart_test
01:13 PASS landice_greenland_sia_restart_test
00:45 PASS landice_greenland_sia_decomposition_test
01:12 PASS landice_hydro_radial_restart_test
01:27 PASS landice_hydro_radial_decomposition_test
01:47 PASS landice_humboldt_mesh-3km_decomposition_test_velo-none_calving-none_subglacialhydro
01:17 PASS landice_humboldt_mesh-3km_restart_test_velo-none_calving-none_subglacialhydro
00:53 PASS landice_dome_2000m_fo_decomposition_test
01:00 FAIL landice_dome_2000m_fo_restart_test
00:44 PASS landice_dome_variable_resolution_fo_decomposition_test
01:24 FAIL landice_dome_variable_resolution_fo_restart_test
00:08 FAIL landice_circular_shelf_decomposition_test
06:48 PASS landice_greenland_fo_decomposition_test
09:44 FAIL landice_greenland_fo_restart_test
05:10 PASS landice_thwaites_fo_decomposition_test
08:46 FAIL landice_thwaites_fo_restart_test
03:19 PASS landice_thwaites_fo-depthInt_decomposition_test
06:28 FAIL landice_thwaites_fo-depthInt_restart_test
12:00 FAIL landice_humboldt_mesh-3km_restart_test_velo-fo_calving-von_mises_stress_damage-threshold_faceMelting
07:57 FAIL landice_humboldt_mesh-3km_restart_test_velo-fo-depthInt_calving-von_mises_stress_damage-threshold_faceMelting
Total runtime 76:36
mcarlson801 commented 3 weeks ago

The landice/circular_shelf/decomposition_test fails in the 1proc_run step with the following stack trace from Albany:

Did you build Albany with the +slfad variant? I think this is the same error I ran into when I was building with DFad (although other tests would be failing with it too, hmmm). I'll take a look and see what's up.

xylar commented 3 weeks ago

Did you build Albany with the +slfad variant?

No, I missed that. Is that for Trilinos? Albany? both?

mcarlson801 commented 3 weeks ago

Actually, scratch that, for MALI we would use +sfad12. You only need it for Albany.

xylar commented 3 weeks ago

Okay, thanks, I'll add that and try again.

mcarlson801 commented 3 weeks ago

Actually, to specify sfad 12, I think it's probably +sfad but I'm not sure how to set the size.

This is the line where the sfadsize gets used: https://github.com/E3SM-Project/spack/blob/develop/var/spack/repos/builtin/packages/albany/package.py#L130

And this is the line where the sfadsize is obtained: https://github.com/E3SM-Project/spack/blob/develop/var/spack/repos/builtin/packages/albany/package.py#L46

@ikalash Do you know what we need to add to the variants to install with +sfad with sfadsize = 12?

xylar commented 3 weeks ago

@mcarlson801 please keep me posted, then.

mcarlson801 commented 3 weeks ago

I looked up how to set multi-valued variants and it looks like the way to do this would be to add +sfad sfadsize=12.

mcarlson801 commented 2 weeks ago

@xylar Did you get a chance to run the tests again with +sfad sfadsize=12? I noticed that my run with +slfad doesn't have failing tests due to validation so I'm wondering if that fixed it.

xylar commented 1 week ago

@mcarlson801, I was away on vacation last week but I'm looking at this now.