Ipotrick / Daxa

Daxa is a convenient, simple and modern gpu abstraction built on vulkan
MIT License
381 stars 28 forks source link

Device lost error after increasing primitive count building acceleration structures #74

Closed Jaisiero closed 7 months ago

Jaisiero commented 9 months ago

So this is the code from .\Daxa\tests\2_daxa_api\10_raytracing where if triangle_count upto 14 its fine. But when I reach 15 Vulkan explodes whit device lost error:

std::vector<daxa_f32vec3> vertices;
const int triangle_count = 15;
for(int i = 0; i < triangle_count*3; ++i) {
    vertices.push_back({random_float(-1.0f, 1.0f), random_float(-1.0f, 1.0f), random_float(-1.0f, 1.0f)});
    vertices.push_back({random_float(-1.0f, 1.0f), random_float(-1.0f, 1.0f), random_float(-1.0f, 1.0f)});
    vertices.push_back({random_float(-1.0f, 1.0f), random_float(-1.0f, 1.0f), random_float(-1.0f, 1.0f)});
}
auto vertex_buffer = device.create_buffer({
    .size = sizeof(daxa_f32vec3) * vertices.size(),
    .allocate_info = daxa::MemoryFlagBits::HOST_ACCESS_RANDOM,
    .name = "vertex buffer",
});
defer { device.destroy_buffer(vertex_buffer); };
// *device.get_host_address_as<decltype(vertices)>(vertex_buffer).value() = vertices;
std::memcpy(device.get_host_address_as<daxa_f32vec3>(vertex_buffer).value(), vertices.data(), sizeof(daxa_f32vec3) * vertices.size());

/// Indices:
std::vector<daxa_u32> indices;
for(daxa_u32 i = 0; i < triangle_count*3; ++i) {
    indices.push_back(i);
}
auto index_buffer = device.create_buffer({
    .size = sizeof(daxa_u32) * indices.size(),
    .allocate_info = daxa::MemoryFlagBits::HOST_ACCESS_RANDOM,
    .name = "index buffer",
});
defer { device.destroy_buffer(index_buffer); };
// *device.get_host_address_as<decltype(indices)>(index_buffer).value() = indices;
std::memcpy(device.get_host_address_as<daxa_u32>(index_buffer).value(), indices.data(), sizeof(daxa_u32) * indices.size());
/// Transforms:
auto transform_buffer = device.create_buffer({
    .size = sizeof(daxa_f32mat3x4),
    .allocate_info = daxa::MemoryFlagBits::HOST_ACCESS_RANDOM,
    .name = "transform buffer",
});
defer { device.destroy_buffer(transform_buffer); };
*device.get_host_address_as<daxa_f32mat3x4>(transform_buffer).value() = daxa_f32mat3x4{
    {1, 0, 0, 0},
    {0, 1, 0, 0},
    {0, 0, 1, 0},
};
/// Triangle Geometry Info:
auto geometries = std::array{
    daxa::BlasTriangleGeometryInfo{
        .vertex_format = daxa::Format::R32G32B32_SFLOAT, // Is also default
        .vertex_data = {},                               // Ignored in get_acceleration_structure_build_sizes.    // Is also default
        .vertex_stride = sizeof(daxa_f32vec3),           // Is also default
        .max_vertex = static_cast<u32>(vertices.size() - 1),
        .index_type = daxa::IndexType::uint32, // Is also default
        .index_data = {},                      // Ignored in get_acceleration_structure_build_sizes. // Is also default
        .transform_data = {},                  // Ignored in get_acceleration_structure_build_sizes. // Is also default
        .count = triangle_count,
        .flags = daxa::GeometryFlagBits::OPAQUE, // Is also default
    }};

In case it could be some synchronization relationship I tried to add a sleep before the build command buffer:

using namespace std::chrono_literals;
std::this_thread::sleep_for(2000ms);

/// Record build commands:
auto exec_cmds = [&]()
{
    auto recorder = device.create_command_recorder({});
    recorder.build_acceleration_structures({
        .blas_build_infos = std::array{blas_build_info, proc_blas_build_info},
    });
    recorder.pipeline_barrier({
        .src_access = daxa::AccessConsts::ACCELERATION_STRUCTURE_BUILD_WRITE,
        .dst_access = daxa::AccessConsts::ACCELERATION_STRUCTURE_BUILD_READ_WRITE,
    });
    recorder.build_acceleration_structures({
        .tlas_build_infos = std::array{tlas_build_info},
    });
    recorder.pipeline_barrier({
        .src_access = daxa::AccessConsts::ACCELERATION_STRUCTURE_BUILD_WRITE,
        .dst_access = daxa::AccessConsts::READ_WRITE,
    });
    return recorder.complete_current_commands();
}();
device.submit_commands({.command_lists = std::array{exec_cmds}});

but it doesn't matter. What it seems to matter is the amount of triangles.

I am not sure if I am doing something wrong here or there's some bug inside.

Ipotrick commented 9 months ago

It might be a blas size thing. I guess until 15 tris the size is the same and past that we have some calculation errors or miss to pass the tri count or somthing.

Jaisiero commented 9 months ago

It might be a blas size thing. I guess until 15 tris the size is the same and past that we have some calculation errors or miss to pass the tri count or somthing.

Yep. Just tricking it just like this:

daxa::AccelerationStructureBuildSizesInfo build_size_info = device.get_blas_build_sizes(blas_build_info);
auto blas_scratch_buffer = device.create_buffer({
    .size = build_size_info.build_scratch_size * 2,
    .name = "blas build scratch buffer",
});

It builds the accel. struct without crashing so I am gonna investigate further.

Jaisiero commented 9 months ago

I found it and I did find another issue too. I think the whole acceleration structure (AS) building process is very error prone cause you first ask how much room you need for the building process, you create those AS and later you issue the info in a command recorder in order to build them. I would be nice if the whole process is driven by the API or at least some extra checks are made, i don't know how to approach it to be honest without to much headaches. I'll do a PR and you guys can check what it's going on. Thanks!