Open gzagaris opened 4 days ago
One more thing to add to this. I did look into the implementation a bit and I did not see any calls to hipGetSymbolAddress()
in the HipConstantMemoryResource.cpp
, which I would have expected and it may be related to the issue that I am seeing.
Following a similar approach as in CudaConstantMemoryResource.cu
, I made the following changes to HipConstantMemoryResource.cpp
:
diff --git a/src/umpire/resource/HipConstantMemoryResource.cpp b/src/umpire/resource/HipConstantMemoryResource.cpp
index 65c5f72c..d00103a2 100644
--- a/src/umpire/resource/HipConstantMemoryResource.cpp
+++ b/src/umpire/resource/HipConstantMemoryResource.cpp
@@ -25,7 +25,8 @@ HipConstantMemoryResource::HipConstantMemoryResource(const std::string& name, in
m_highwatermark{0},
m_platform{Platform::hip},
m_offset{0},
- m_ptr{s_umpire_internal_device_constant_memory}
+ m_ptr{nullptr},
+ m_initialized{false}
{
}
@@ -33,6 +34,16 @@ void* HipConstantMemoryResource::allocate(std::size_t bytes)
{
std::lock_guard<std::mutex> lock{m_mutex};
+ if (!m_initialized) {
+ hipError_t error = ::hipGetSymbolAddress((void**)&m_ptr, s_umpire_internal_device_constant_memory);
+
+ if (error != hipSuccess) {
+ UMPIRE_ERROR(runtime_error, umpire::fmt::format("hipGetSymbolAddress failed with error: {}", ::hipGetErrorString(error)));
+ }
+
+ m_initialized = true;
+ }
+
char* ptr{static_cast<char*>(m_ptr) + m_offset};
m_offset += bytes;
diff --git a/src/umpire/resource/HipConstantMemoryResource.hpp b/src/umpire/resource/HipConstantMemoryResource.hpp
index d7afac23..5e32f45f 100644
--- a/src/umpire/resource/HipConstantMemoryResource.hpp
+++ b/src/umpire/resource/HipConstantMemoryResource.hpp
@@ -39,7 +39,8 @@ class HipConstantMemoryResource : public MemoryResource {
std::size_t m_offset;
void* m_ptr;
-
+ bool m_initialized;
+
std::mutex m_mutex;
};
I didn't have any luck with that though, hipGetSymbolAddress
, now throws the following runtime error:
terminate called after throwing an instance of 'umpire::runtime_error'
45: what(): ! Umpire runtime_error [.../Umpire/src/umpire/resource/HipConstantMemoryResource.cpp:41]: hipGetSymbolAddress failed with error: invalid device symbol
Describe the bug
Copying data from the host to constant device memory using HIP throws a runtime error. The code works fine with CUDA , but, Umpire throws the following runtime error when compiled with HIP:
To Reproduce
Here is a code snippet that reproduces this behavior:
I am compiling Umpire with
-DENABLE_HIP=On -DUMPIRE_ENABLE_DEVICE_CONST=On
.Am I missing anything?
Expected behavior
I would have expected this to work and not throw a runtime error.
Compilers & Libraries (please complete the following information):