Exawind / nalu-wind

Solver for wind farm simulations targeting exascale computational platforms
https://nalu-wind.readthedocs.io
Other
122 stars 83 forks source link

Fix error in unit test from unsynched bulk #1260

Closed marchdf closed 4 months ago

marchdf commented 4 months ago

Fixes these errors:

error: 'show' is not a valid command.
C++ exception with description "Requirement( bulkData.in_synchronized_state() ) FAILED                                                                                                                                                                                [63/29722]
Error occurred at: stk_mesh/stk_mesh/base/SkinBoundary.cpp:88

Error: Cannot use create_all_sides while in another mod cycle.
" thrown in the test body.
marchdf commented 4 months ago

These edits fix the errors but I now get another error later in the unit tests:

[----------] 2 tests from ConductionResidualFixture
[ RUN      ] ConductionResidualFixture.residual_executes
[       OK ] ConductionResidualFixture.residual_executes (0 ms)
[ RUN      ] ConductionResidualFixture.linearized_residual_executes
Process 16543 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x2e44dd734)
    frame #0: 0x00000001844e6e08 libsystem_malloc.dylib`tiny_free_list_remove_ptr + 112
libsystem_malloc.dylib`tiny_free_list_remove_ptr:
->  0x1844e6e08 <+112>: ldr    x12, [x1, #0x8]!
    0x1844e6e0c <+116>: mov    x11, x12
    0x1844e6e10 <+120>: xpacd  x11
    0x1844e6e14 <+124>: mov    x17, x11
Target 0: (unittestX) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x2e44dd734)
  * frame #0: 0x00000001844e6e08 libsystem_malloc.dylib`tiny_free_list_remove_ptr + 112
    frame #1: 0x00000001844e66c8 libsystem_malloc.dylib`tiny_free_no_lock + 1060
    frame #2: 0x00000001844e6120 libsystem_malloc.dylib`free_tiny + 496
    frame #3: 0x0000000100bb381c unittestX`Kokkos::HostSpace::impl_deallocate(char const*, void*, unsigned long, unsigned long, Kokkos_Profiling_SpaceHandle) const + 332
    frame #4: 0x0000000100bb3900 unittestX`Kokkos::Impl::SharedAllocationRecord<Kokkos::HostSpace, void>::~SharedAllocationRecord() + 120
    frame #5: 0x000000010003873c unittestX`Kokkos::Impl::SharedAllocationRecord<Kokkos::HostSpace, Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, unsigned int, true>>::~SharedAllocationRecord(this=0x0000600003018bd0) at Kokkos_SharedAlloc.hpp:281:7
    frame #6: 0x000000010003867c unittestX`Kokkos::Impl::SharedAllocationRecord<Kokkos::HostSpace, Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, unsigned int, true>>::~SharedAllocationRecord(this=0x0000600003018bd0) at Kokkos_SharedAlloc.hpp:281:7
    frame #7: 0x00000001000386a8 unittestX`Kokkos::Impl::SharedAllocationRecord<Kokkos::HostSpace, Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, unsigned int, true>>::~SharedAllocationRecord(this=0x0000600003018bd0) at Kokkos_SharedAlloc.hpp:281:7
    frame #8: 0x000000010c1778bc libnalu.dylib`Kokkos::Impl::SharedAllocationRecord<void, void>::decrement(Kokkos::Impl::SharedAllocationRecord<void, void>*) + 60
    frame #9: 0x000000010001a0fc unittestX`Kokkos::Impl::SharedAllocationTracker::~SharedAllocationTracker(this=0x0000000106090300) at Kokkos_SharedAlloc.hpp:419:30
    frame #10: 0x000000010001a0ac unittestX`Kokkos::Impl::SharedAllocationTracker::~SharedAllocationTracker(this=0x0000000106090300) at Kokkos_SharedAlloc.hpp:419:29
    frame #11: 0x0000000100032144 unittestX`Kokkos::Impl::ViewTracker<Kokkos::View<unsigned int [2], Kokkos::LayoutLeft, Kokkos::HostSpace>>::~ViewTracker(this=0x0000000106090300) at Kokkos_ViewTracker.hpp:39:8
    frame #12: 0x00000001000320d4 unittestX`Kokkos::Impl::ViewTracker<Kokkos::View<unsigned int [2], Kokkos::LayoutLeft, Kokkos::HostSpace>>::~ViewTracker(this=0x0000000106090300) at Kokkos_ViewTracker.hpp:39:8
    frame #13: 0x000000010003244c unittestX`Kokkos::View<unsigned int [2], Kokkos::LayoutLeft, Kokkos::HostSpace>::~View(this=0x0000000106090300) at Kokkos_View.hpp:1269:19
    frame #14: 0x0000000100031fec unittestX`Kokkos::View<unsigned int [2], Kokkos::LayoutLeft, Kokkos::HostSpace>::~View(this=0x0000000106090300) at Kokkos_View.hpp:1269:19
    frame #15: 0x000000010043bc98 unittestX`Kokkos::DualView<double**, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, void>::~DualView(this=0x0000000106090300) at Kokkos_DualView.hpp:113:7
    frame #16: 0x000000010043bc54 unittestX`Kokkos::DualView<double**, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, void>::~DualView(this=0x0000000106090300) at Kokkos_DualView.hpp:113:7
    frame #17: 0x000000010043bc28 unittestX`Tpetra::Details::WrappedDualView<Kokkos::DualView<double**, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, void>>::~WrappedDualView(this=0x0000000106090300) at Tpetra_Details_WrappedDualView.hpp:143:7
    frame #18: 0x000000010043ba24 unittestX`Tpetra::Details::WrappedDualView<Kokkos::DualView<double**, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, void>>::~WrappedDualView(this=0x0000000106090300) at Tpetra_Details_WrappedDualView.hpp:143:7
    frame #19: 0x000000010043b80c unittestX`Tpetra::MultiVector<double, int, long, Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>>::~MultiVector(this=0x00000001060901a0, vtt=0x0000000100f2cb88) at Tpetra_MultiVector_decl.hpp:830:37
    frame #20: 0x00000001004300cc unittestX`Tpetra::MultiVector<double, int, long, Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>>::~MultiVector(this=0x00000001060901a0) at Tpetra_MultiVector_decl.hpp:830:37
    frame #21: 0x0000000100452fc4 unittestX`sierra::nalu::matrix_free::ConductionResidualFixture::~ConductionResidualFixture(this=0x000000010608fe00) at UnitTestConductionInterior.C:77:7
    frame #22: 0x000000010045304c unittestX`sierra::nalu::matrix_free::ConductionResidualFixture_linearized_residual_executes_Test::~ConductionResidualFixture_linearized_residual_executes_Test(this=0x000000010608fe00) at UnitTestConductionInterior.C:123:1
    frame #23: 0x0000000100450714 unittestX`sierra::nalu::matrix_free::ConductionResidualFixture_linearized_residual_executes_Test::~ConductionResidualFixture_linearized_residual_executes_Test(this=0x000000010608fe00) at UnitTestConductionInterior.C:123:1
    frame #24: 0x0000000100450740 unittestX`sierra::nalu::matrix_free::ConductionResidualFixture_linearized_residual_executes_Test::~ConductionResidualFixture_linearized_residual_executes_Test(this=0x000000010608fe00) at UnitTestConductionInterior.C:123:1
    frame #25: 0x0000000100bc1540 unittestX`void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) + 80
    frame #26: 0x0000000100bc2db8 unittestX`testing::TestInfo::Run() + 336
    frame #27: 0x0000000100bc37e8 unittestX`testing::TestSuite::Run() + 288
    frame #28: 0x0000000100bd0da8 unittestX`testing::internal::UnitTestImpl::RunAllTests() + 984
    frame #29: 0x0000000100bd07dc unittestX`bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 80
    frame #30: 0x0000000100bd0758 unittestX`testing::UnitTest::Run() + 124
    frame #31: 0x0000000100006c24 unittestX`RUN_ALL_TESTS() at gtest.h:14808:46
    frame #32: 0x0000000100006a10 unittestX`main(argc=1, argv=0x000000016fdf9a58) at unit_tests.C:60:17
    frame #33: 0x0000000184367f28 dyld`start + 2236

@alanw0 and @rcknaus : @psakievich thought you might be able to help on this one?

marchdf commented 4 months ago

I updated the comment above with a stack trace from a debug build

rcknaus commented 4 months ago

@alanw0 and @rcknaus : @psakievich thought you might be able to help on this one?

Looks like it was trying to write to past the end of multivector. do we have asan testing still?

marchdf commented 4 months ago
[----------] Global test environment tear-down
[==========] 572 tests from 136 test suites ran. (202985 ms total)
[  PASSED  ] 571 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] VOFKernelHex8Mesh.NGP_adv_diff_edge_tpetra

I think we can merge. The failing test has a 1e-14 diff. Thanks @rcknaus for fixing that last issue!