LLNL / Umpire

An application-focused API for memory management on NUMA & GPU architectures
MIT License
312 stars 52 forks source link

Umpire Segfaults during initialization of DEVICE_CONST #42

Closed robinson96 closed 5 years ago

robinson96 commented 5 years ago

Describe the bug

Umpire Segfaults while creating the DEVICE_CONST allocator.

To Reproduce

I am using CHAI + UMPIRE in a large multiphysics code. Have not attempted to reproduce yet in a smaller executable. The problem occurs during initialization of umpire.

This is on a P8+ P100 system.

Expected behavior

Don't segfault.

Compilers & Libraries (please complete the following information):

Additional context Umpire version: f92f367 Merge pull request #39 from LLNL/feature/coalesce-only-when-coalesceable

Stack Trace:

``

0 std::operator<< <char, std::char_traits, std::allocator > (os=..., str=...)

at /usr/tce/packages/gcc/gcc-4.9.3/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3/../../../../include/c++/4.9.3/bits/basic_string.h:2777

1 umpire::util::Logger::logMessage (this=, level=, message=..., fileName=..., line=38)

at /g/g18/probinso/ale3d/bugfixday/imports/umpire/src/umpire/util/Logger.cpp:61

2 0x000000001560bea8 in umpire::resource::CudaConstantMemoryResource::CudaConstantMemoryResource (this=0x4aa1ddd0,

name=..., id=<optimized out>, traits=...)
at /g/g18/probinso/ale3d/bugfixday/imports/umpire/src/umpire/resource/CudaConstantMemoryResource.cu:38

3 0x000000001560bb6c in __gnu_cxx::new_allocator::construct<umpire::resource::CudaConstantMemoryResource, char const (&) [13], int&, umpire::resource::MemoryResourceTraits&> (this=,

__p=0x4aa1ddd0, __args=..., __args=..., __args=...)
at /usr/tce/packages/gcc/gcc-4.9.3/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3/../../../../include/c++/4.9.3/ext/new_allocator.h:120

4 0x000000001560b8cc in std::allocator_traits<std::allocator >::_S_construct<umpire::resource::CudaConstantMemoryResource, char const (&) [13], int&, umpire::resource::MemoryResourceTraits&> (

__p=<optimized out>, __args=..., __args=..., __args=..., __a=...)
at /usr/tce/packages/gcc/gcc-4.9.3/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3/../../../../include/c++/4.9.3/bits/alloc_traits.h:253

5 std::allocator_traits<std::allocator >::construct<umpire::resource::CudaConstantMemoryResource, char const (&) [13], int&, umpire::resource::MemoryResourceTraits&> (p=, a=...,

__args=..., __args=..., __args=...)
at /usr/tce/packages/gcc/gcc-4.9.3/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3/../../../../include/c++/4.9.3/bits/alloc_traits.h:399

6 std::_Sp_counted_ptr_inplace<umpire::resource::CudaConstantMemoryResource, std::allocator, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<char const (&) [13], int&, umpire::resource::MemoryResourceTraits&> (this=, a=..., args=..., args=..., args=...)

at /usr/tce/packages/gcc/gcc-4.9.3/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3/../../../../include/c++/4.9.3/bits/shared_ptr_base.h:515

7 gnu_cxx::new_allocator<std::_Sp_counted_ptr_inplace<umpire::resource::CudaConstantMemoryResource, std::allocator, (gnu_cxx::_Lock_policy)2> >::construct<std::_Sp_counted_ptr_inplace<umpire::resource::CudaConstantMemoryResource, std::allocator, (gnu_cxx::_Lock_policy)2>, std::allocator const, char const (&) [13], int&, umpire::resource::MemoryResourceTraits&>(std::_Sp_counted_ptr_inplace<umpire::resource::CudaConstantMemoryResource, std::allocator, (__gnu_cxx::_Lock_policy)2>*, std::allocator const&&, char const (&) [13], int&, umpire::resource::MemoryResourceTraits&) (this=, p=,

__args=<optimized out>, __args=<optimized out>, __args=<optimized out>, __args=<optimized out>)
at /usr/tce/packages/gcc/gcc-4.9.3/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3/../../../../include/c++/4.9.3/ext/new_allocator.h:120

8 std::allocator_traits<std::allocator<std::_Sp_counted_ptr_inplace<umpire::resource::CudaConstantMemoryResource, std::allocator, (gnu_cxx::_Lock_policy)2> > >::_S_construct<std::_Sp_counted_ptr_inplace<umpire::resource::CudaConstantMemoryResource, std::allocator, (__gnu_cxx::_Lock_policy)2>, std::allocator const, char const (&) [13], int&, umpire::resource::MemoryResourceTraits&> (a=..., p=, args=, args=, args=,

__args=<optimized out>)
at /usr/tce/packages/gcc/gcc-4.9.3/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3/../../../../include/c++/4.9.3/bits/alloc_traits.h:253

9 std::allocator_traits<std::allocator<std::_Sp_counted_ptr_inplace<umpire::resource::CudaConstantMemoryResource, std::allocator, (gnu_cxx::_Lock_policy)2> > >::construct<std::_Sp_counted_ptr_inplace<umpire::resource::CudaConstantMemoryResource, std::allocator, (__gnu_cxx::_Lock_policy)2>, std::allocator const, char const (&) [13], int&, umpire::resource::MemoryResourceTraits&> (a=..., p=, args=, args=, args=,

__args=<optimized out>)
at /usr/tce/packages/gcc/gcc-4.9.3/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3/../../../../include/c++/4.9.3/bits/alloc_traits.h:399

10 std::shared_count<(__gnu_cxx::_Lock_policy)2>::shared_count<umpire::resource::CudaConstantMemoryResource, std::allocator, char const (&) [13], int&, umpire::resource::MemoryResourceTraits&> (

this=0x3fffffffb6e8, __a=..., __args=..., __args=..., __args=...)
at /usr/tce/packages/gcc/gcc-4.9.3/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3/../../../../include/c++/4.9.3/bits/shared_ptr_base.h:619

11 0x000000001560b690 in std::shared_ptr<umpire::resource::CudaConstantMemoryResource, (__gnu_cxx::_Lock_policy)2>::shared_ptr<std::allocator, char const (&) [13], int&, umpire::resource::MemoryResourceTraits&> (a=..., args=..., args=..., args=..., this=, __tag=...)

at /usr/tce/packages/gcc/gcc-4.9.3/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3/../../../../include/c++/4.9.3/bits/shared_ptr_base.h:1089

12 std::shared_ptr::shared_ptr<std::allocator, char const (&) [13], int&, umpire::resource::MemoryResourceTraits&> (a=..., args=,

__args=<optimized out>, this=<optimized out>, __tag=..., __args=<optimized out>)
at /usr/tce/packages/gcc/gcc-4.9.3/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3/../../../../include/c++/4.9.3/bits/shared---Type <return> to continue, or q <return> to quit---

_ptr.h:316

13 std::allocate_shared<umpire::resource::CudaConstantMemoryResource, std::allocator, char const (&) [13], int&, umpire::resource::MemoryResourceTraits&> (a=..., args=,

__args=<optimized out>, __args=<optimized out>)
at /usr/tce/packages/gcc/gcc-4.9.3/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3/../../../../include/c++/4.9.3/bits/shared_ptr.h:587

14 std::make_shared<umpire::resource::CudaConstantMemoryResource, char const (&) [13], int&, umpire::resource::MemoryResourceTraits&> (args=, args=, __args=)

at /usr/tce/packages/gcc/gcc-4.9.3/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3/../../../../include/c++/4.9.3/bits/shared_ptr.h:603

15 umpire::resource::CudaConstantMemoryResourceFactory::create (this=, id=4)

at /g/g18/probinso/ale3d/bugfixday/imports/umpire/src/umpire/resource/CudaConstantMemoryResourceFactory.cpp:48

16 0x000000001560acb0 in umpire::resource::MemoryResourceRegistry::makeMemoryResource (this=0x4a9f5da0, name=...,

id=<optimized out>) at /g/g18/probinso/ale3d/bugfixday/imports/umpire/src/umpire/resource/MemoryResourceRegistry.cpp:50

17 0x00000000155f44dc in umpire::ResourceManager::initialize (this=0x4a9e5ab0)

at /g/g18/probinso/ale3d/bugfixday/imports/umpire/src/umpire/ResourceManager.cpp:122

18 0x00000000155f2364 in umpire::ResourceManager::ResourceManager (this=0x4a9e5ab0)

at /g/g18/probinso/ale3d/bugfixday/imports/umpire/src/umpire/ResourceManager.cpp:96

19 0x00000000155f0fe4 in umpire::ResourceManager::getInstance ()

at /g/g18/probinso/ale3d/bugfixday/imports/umpire/src/umpire/ResourceManager.cpp:50

20 0x00000000155ebf10 in chai::ArrayManager::ArrayManager (this=0x4a9e59e0)

at /g/g18/probinso/ale3d/bugfixday/imports/chai/src/chai/ArrayManager.cpp:66

21 0x00000000155ebe0c in chai::ArrayManager::getInstance ()

at /g/g18/probinso/ale3d/bugfixday/imports/chai/src/chai/ArrayManager.cpp:58

22 0x0000000010787840 in chai::ManagedArray::ManagedArray (this=0x46fbff50 )

``

davidbeckingsale commented 5 years ago

This was due to running on a node without a GPU.

We will add a better error message to catch and prevent this in the future.

davidbeckingsale commented 5 years ago

Improved error message added in #44