Open maksym-pasichnyk opened 1 year ago
Are you sure the device memory is host coherent? It seems to be behaving as if the device memory isn't host coherent.
Is this working on other Vulkan platforms?
@billhollings Yes, memory is host coherent. I don't test on other Vulkan platforms, I'll check and reply later
auto gpu_find_memory_type_index(GpuContext* context, u32 memory_type_bits, vk::MemoryPropertyFlags memory_property_flags) -> u32 {
auto memory_properties = context->physical_device.getMemoryProperties();
for (u32 i = 0; i < memory_properties.memoryTypeCount; ++i) {
if ((memory_type_bits & (1 << i)) && (memory_properties.memoryTypes[i].propertyFlags & memory_property_flags) == memory_property_flags) {
return i;
}
}
return std::numeric_limits<u32>::max();
}
void gpu_allocate_memory(GpuContext* context, GpuAllocation* allocation, vk::MemoryRequirements memory_requirements, vk::MemoryPropertyFlags memory_property_flags, vk::MemoryAllocateFlags memory_allocate_flags) {
auto memory_type_index = gpu_find_memory_type_index(context, memory_requirements.memoryTypeBits, memory_property_flags);
auto memory_allocate_flags_info = vk::MemoryAllocateFlagsInfo()
.setFlags(memory_allocate_flags);
auto memory_allocate_info = vk::MemoryAllocateInfo()
.setPNext(&memory_allocate_flags_info)
.setAllocationSize(memory_requirements.size)
.setMemoryTypeIndex(memory_type_index);
allocation->device_memory = context->logical_device.allocateMemory(memory_allocate_info);
allocation->memory_requirements = memory_requirements;
allocation->memory_property_flags = memory_property_flags;
}
void gpu_create_allocator(GpuContext* context, GpuLinearAllocator* allocator, vk::DeviceSize capacity) {
vk::BufferUsageFlags buffer_usage_flags = {};
buffer_usage_flags |= vk::BufferUsageFlagBits::eTransferSrc;
buffer_usage_flags |= vk::BufferUsageFlagBits::eTransferDst;
buffer_usage_flags |= vk::BufferUsageFlagBits::eUniformTexelBuffer;
buffer_usage_flags |= vk::BufferUsageFlagBits::eStorageTexelBuffer;
buffer_usage_flags |= vk::BufferUsageFlagBits::eUniformBuffer;
buffer_usage_flags |= vk::BufferUsageFlagBits::eStorageBuffer;
buffer_usage_flags |= vk::BufferUsageFlagBits::eIndexBuffer;
buffer_usage_flags |= vk::BufferUsageFlagBits::eVertexBuffer;
buffer_usage_flags |= vk::BufferUsageFlagBits::eIndirectBuffer;
buffer_usage_flags |= vk::BufferUsageFlagBits::eShaderDeviceAddress;
auto buffer_create_info = vk::BufferCreateInfo()
.setSize(capacity)
.setUsage(buffer_usage_flags)
.setSharingMode(vk::SharingMode::eExclusive);
allocator->buffer = context->logical_device.createBuffer(buffer_create_info);
allocator->capacity = capacity;
allocator->offset = 0u;
vk::MemoryAllocateFlags memory_allocate_flags = {};
memory_allocate_flags |= vk::MemoryAllocateFlagBits::eDeviceAddress;
auto memory_requirements = context->logical_device.getBufferMemoryRequirements(allocator->buffer);
vk::MemoryPropertyFlags memory_property_flags = {};
memory_property_flags |= vk::MemoryPropertyFlagBits::eHostVisible;
memory_property_flags |= vk::MemoryPropertyFlagBits::eHostCoherent;
gpu_allocate_memory(context, &allocator->allocation, memory_requirements, memory_property_flags, memory_allocate_flags);
context->logical_device.bindBufferMemory(allocator->buffer, allocator->allocation.device_memory, 0);
auto buffer_device_address_info = vk::BufferDeviceAddressInfo()
.setBuffer(allocator->buffer);
allocator->device_address = context->logical_device.getBufferAddress(buffer_device_address_info);
}
@billhollings This code is working on windows
auto memory = vulkan->context.frame_allocators[vulkan->frame_index].allocation.device_memory;
void* mapped = vulkan->context.logical_device.mapMemory(memory, 0, VK_WHOLE_SIZE, vk::MemoryMapFlags());
std::memcpy(reinterpret_cast<u8*>(mapped) + index_buffer_info.offset, indices.data(), sizeof(indices));
std::memcpy(reinterpret_cast<u8*>(mapped) + vertex_buffer_info.offset, vertices.data(), sizeof(vertices));
vulkan->context.logical_device.unmapMemory(memory);
Odd.
Do you have a small demo app you can post I link to so we can replicate the issue here?
@billhollings https://github.com/maksym-pasichnyk/vulkan-gpu
@billhollings https://github.com/maksym-pasichnyk/vulkan-gpu
I downloaded your app and can open it. During opening, there are 3 calls to vkMapMemory()
and one call to vkUnmapMemory()
, none of which seem to be originating in the code you have above.
The graphical menu system that appears in this app is quite large. Can you provide us with instructions about how to replicate the problem you are having above?
This program is a compute shader rasterizer. If you enable the "Use memcpy" toggle, memcpy will be used instead of cmd.updateBuffer. UI will disappear after a few frames on macOS, but will work on Windows
Sorry for the delay. We've been busy getting a release ready for the upcoming SDK.
The problem was repeatable in the project you posted above, but the code in main.cpp
in the project you posted above is quite different than the code you posted previously.
I'll comment on the code in the project, which includes:
for (auto cmd_list : std::span(draw_data->CmdLists, draw_data->CmdListsCount)) {
auto vtx_buffer_size = cmd_list->VtxBuffer.Size * sizeof(ImDrawVert);
auto idx_buffer_size = cmd_list->IdxBuffer.Size * sizeof(ImDrawIdx);
GpuBufferInfo vtx_buffer_info;
GpuBufferInfo idx_buffer_info;
if (!vulkan->AllocateTemporary(&vtx_buffer_info, vtx_buffer_size, alignof(ImDrawVert))) {
fprintf(stderr, "Failed to allocate vertex buffer for ImGui\n");
continue;
}
if (!vulkan->AllocateTemporary(&idx_buffer_info, idx_buffer_size, alignof(ImDrawIdx))) {
fprintf(stderr, "Failed to allocate index buffer for ImGui\n");
continue;
}
if (use_memcpy) {
std::memcpy(vtx_buffer_info.mapped, cmd_list->VtxBuffer.Data, vtx_buffer_size);
std::memcpy(idx_buffer_info.mapped, cmd_list->IdxBuffer.Data, idx_buffer_size);
} else {
UpdateBuffer(cmd, &vtx_buffer_info, cmd_list->VtxBuffer.Data, vtx_buffer_size);
UpdateBuffer(cmd, &idx_buffer_info, cmd_list->IdxBuffer.Data, idx_buffer_size);
}
In this code when we ran it, draw_data->CmdListsCount
is 2, so you draw two separate cmd_lists
in each frame, but vtx_buffer_info.mapped
and idx_buffer_info.mapped
are dependent only on the frame_index
, which means you use memcpy
to overwrite the same device memory range with data for both cmd_lists
before using it for either cmd_list
. Because the GPU doesn't see the device memory contents until the command buffer is executed, it uses the data for the second cmd_list
for both.
The UpdateBuffer
path works, because you submit a separate UpdateBuffer
command to the command buffer for each cmd_list
.
If I add to this code an index that tracks which command list is being submitted, and only allow the memcpy
path on the first cmd_list
in the frame, and let the second follow the UpdateBuffer
path, everything works as expected.
[Edit:] I've just realized that vtx_buffer_info.mapped
and idx_buffer_info.mapped
are moved forward on each cmd_list
iteration within a frame, so the problem is not as simple as I originally described here. But it's interesting that limiting the memcpy
to only one cmd_list
per frame fixes this.
Okay. I've figured out the actual problem here now.
The problem is that despite the memory being "shared" between CPU and GPU, by using memcpy()
to update the contents in a MTLBuffer
, and then using only the gpuAddress
of the MTLBuffer
in a shader, the GPU actually has no knowledge that the MTLBuffer
is being used after being updated, and it doesn't update whatever GPU-side caches need to be updated.
This is a similar situation to using Metal argument buffers, and the solution is the same, calling Metal's useResource:
for any MTLBuffer
we "think" might be accessed by a shader without the buffer actually being declared to the shader via setBuffer:offset:atIndex:
.
We'll have to track any VkDeviceMemory
objects that are:
VK_MEMORY_ALLOCATE_DEVICE_ADDRESS_BIT
vkMemoryMap()
and then invoke useResource:
with the underlying MTLBuffer
whenever a pipeline that includes a shader that uses PhysicalStorageBufferAddresses
is bound.
The reason that vkCmdUpdateBuffer()
avoids this is because it invokes a buffer BLIT operation on the GPU, thereby identifying to the GPU that its caches for that MTLBuffer
should be updated.
Glad to hear it's not my code problem, but sad that it's actually a bug
I'm wondering why this issue exists, I thought the host coherent memory was actually a MTLStorageModeShared and there no need for additional GPU notification
I thought the host coherent memory was actually a
MTLStorageModeShared
and there no need for additional GPU notification
It is indeed using MTLStorageModeShared
.
But I believe this issue is about letting the GPU know that the resource is being used by a render or compute pass. In that sense, it is similar to using the resources in argument buffers. The encoder useResource:
family of methods are used to tell the GPU to make a resource "resident" during a render or compute pass. I interpret "resident" to mean loaded into, or referenceable from, on-chip caches.
PR #1868 fixes this. Please retest with latest MoltenVK and close this issue if it is fixed.
Sorry for long response. This code now works with latest MoltenVK, but I found new case where this problem still exist, I'll create a small demo app for reproducing this.
Hello, I'm experiencing an issue with MoltenVK that I believe is related to this bug. I am trying to render a point cloud using buffer references with this vertex shader:
#version 460
#extension GL_EXT_buffer_reference : require
layout (location = 0) out vec4 vertex_color;
layout (buffer_reference) readonly buffer vertex_buffer_t {
float data[];
};
layout (push_constant) uniform push_constant_t {
mat4 transform;
vertex_buffer_t vertex_buffer;
} constants;
vec3 get_vertex_position(uint i)
{
return vec3(constants.vertex_buffer.data[3 * i + 0],
constants.vertex_buffer.data[3 * i + 1],
constants.vertex_buffer.data[3 * i + 2]);
}
void main()
{
vec3 position = get_vertex_position(gl_VertexIndex);
gl_Position = constants.transform * vec4(position, 1.0);
gl_PointSize = 10.0;
vertex_color = vec4(1.0);
}
but it only works for a few frames before constants.vertex_buffer.data
starts evaluating to 0, similar to the clip that @maksym-pasichnyk posted. The point cloud data is uploaded by memcpy'ing it into a mapped staging buffer then calling vkCmdCopyBuffer
to copy it to the gpu.
I am using Vulkan SDK v1.3.168.1, and this code works with no issues on Windows/NVIDIA drivers. I am also using VK_KHR_dynamic_rendering, if that influences anything.
@billhollings is this issue now solved by https://github.com/KhronosGroup/MoltenVK/pull/2128 ?
@billhollings is this issue now solved by #2128 ?
Unlikely. PR #2128 fixed an issue connected to push constants. This issue was triggered by an open vkMemoryMap()
. It was mostly solved with PR #1868, and marked so above, but there have been a couple of additional concerns raised above since, so it remains partially open for further evaluation.
I'm trying to update buffer by mapping HostVisible + HostCoherent memory and updating via memcpy, but data inside shader is wrong.
Code below not working
If I add line below before mapMemory or after unmapMemory, everything working fine
Also when I updating buffers with cmd.updateBuffer, everything is working fine