LWJGL / lwjgl3

LWJGL is a Java library that enables cross-platform access to popular native APIs useful in the development of graphics (OpenGL, Vulkan, bgfx), audio (OpenAL, Opus), parallel computing (OpenCL, CUDA) and XR (OpenVR, LibOVR, OpenXR) applications.
https://www.lwjgl.org
BSD 3-Clause "New" or "Revised" License
4.67k stars 628 forks source link

Java port of vulkan-tutorial.com shows a black screen on specifically a AMD RX 6600 #835

Open Trunkvv opened 1 year ago

Trunkvv commented 1 year ago

Version

3.3.1

Platform

Windows x64

JDK

jdk-17.0.5.8-hotspot

Module

Vulkan

Bug description

Hi,

I'm working on an extended version of the 'official' Java port of vulkan-tutorial.com. When swapping my AMD RX 570 for a AMD RX 6600, and running the code the (glfw) window remains black.

I've check a lot of stuff. Can anybody help me what to look for? Or reproduce and confirming the same issue?

When looking for or ruling out a possible cause, the following have been checked:

My best guess it's a specific bug caused in mixture of something:

Reproducible with the code on the url below (chapter Ch15HelloTriangle.java for instance) https://github.com/Naitsirc98/Vulkan-Tutorial-Java/tree/master/src/main/java/javavulkantutorial

with latest releases: VulkanSDK 1.3.231.1 AMD drivers 22.11.2 jdk-17.0.5.8-hotspot lwjgl 3.3.1 VSCode 1.74.1

Stacktrace or crash log output

No response

Trunkvv commented 1 year ago

And using LWJGL nightly of today didn't fix it either.

ws909 commented 1 year ago

plugged the RX 570 in an not reinstalling any driver, the issue is gone!!!

If I understand you correctly, this temporarily fixed the problem? So the issue comes and goes?

ws909 commented 1 year ago

Looking through your code, I found some memory leaks (in your code, not LWJGL) (false positive), which eventually lead me here:

Line 801 in org.lwjgl.system.MemoryStack:

public ByteBuffer UTF8(CharSequence text, boolean nullTerminated) {
    int  length = memLengthUTF8(text, nullTerminated);
    long target = nmalloc(POINTER_SIZE, length);
    encodeUTF8Unsafe(text, nullTerminated, target);
    return MemoryUtil.wrap(BUFFER_BYTE, target, length).order(NATIVE_ORDER);
}

POINTER_SIZE is supposed to be either 4 or 8, however, when I cloned your project, and IntelliJ IDEA decompiled the LWJGL binaries from Maven, POINTER_SIZE is 1.

Trunkvv commented 1 year ago

plugged the RX 570 in an not reinstalling any driver, the issue is gone!!!

If I understand you correctly, this temporarily fixed the problem? So the issue comes and goes?

yes indeed, the issue comes on the RX 6600 and goes with the RX 570

Spasi commented 1 year ago

Hey @Trunkvv,

I just tried it on Windows, 3070 Ti with latest drivers. No issues, no warnings, everything renders correctly (at least Ch15 and Ch29).

Try reading back the framebuffer and writing it to an image on disk. Will tell you if it's an issue with the rendering or the presentation.

Trunkvv commented 1 year ago

Hi @Spasi,

Thank you for trying on the 3070.

Good suggestion on reading the framebuffer. In my noob learning curve; couple of hours spent trying to get it work. For writing to disk, I've found org.lwjgl.stb.STBImageWrite.stbi_write_png, but the data is required in ByteBuffer. But I'm very unsure (just after vkQueuePresentKHR) how to get an image (I've got the Long pointer) into a ByteBuffer . Still investigating it, but maybe you have a hint?

ws909 commented 1 year ago

@Trunkvv vkCmdCopyImageToBuffer

Copy the frame buffer image into a Vulkan buffer using CPU-accessible memory, then pass (copy) the buffer contents to STB.

There’s 2 kinds of frame buffer images you can use for testing. You can render to the swap chain images (minimum 1) then copy these images to the buffer. If there’s issues with this, try creating your own images, set up a frame buffer with these, and thereby not using the swap chain at all. That’ll tell you if there’s issues with the swap chain extension.

Careful with these «long» pointers. Java is supposed to be so safe and easy, but working with Vulkan can be a horrible experience, sometimes. I’ve had bugs and misunderstandings several times when working with buffers, memory, memory accessors (Buffer/MemorySegment), etc. It can be quite horrible. I wish Java had typedefs. So Java removed typedefs, disallowed pointers, and languages such as Swift have stuff like «UnsafeMutableRawPointer». Java is getting better, but after having tried using the new FFI APIs, well, it’s too verbose and cumbersome. (Not to mention that downcalls don’t work on AArch64 yet, and libraries that exist aren’t found…). It’s so hard to get an overview, see the types in use, and such. Kinda miss plain old C pointers. At least they’re straightforward! So watch out for bugs caused by Java’s not very kind approach to these things!

Trunkvv commented 1 year ago

I'm learning here. Buffer is been taken care of.

But in the swapchain the present image has layout VK_IMAGE_LAYOUT_PRESENT_SRC_KHR, so that's a no go to put in a buffer, as it requires VK_IMAGE_LAYOUT_GENERAL or VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL. Makes sense not to work with presentable layout.

So as you suggested, my next challange would be to create my own image (+view) and get it rendered to there.

To be continued.

Trunkvv commented 1 year ago

As working with pointers, I agree, already had some headaches working with it. We will see whether I'll manage to keep motivated on Vulkan with Java.

ws909 commented 1 year ago

Can you not do a layout transfer on the swap chain image? One before copying, then another after copying? Also, as far as I’m concerned, Nvidia doesn’t care about the layout you specify, though AMD does. Watched a technical presentation from AMD some years ago, which claimed so back then. May have changed now.

ws909 commented 1 year ago

I assume the first primitive and value object previews will be available in Java 21, so development on LWJGL 4 may start then, which will likely use primitive types in place of typedefs and pointers. In the first versions of the primary Valhalla features, in C1, primitive types will be treated as value objects, which will be normal reference types.

Trunkvv commented 1 year ago

layout transfer? Good idea. I've transitioned the layout with a pipeline barrier, and copied it to a buffer. No validation errors anymore :-) I'm still in the dark/horror about passing data from the buffer to STB, as the native stbi_write_png with JVM crashes. That will be the next challange. Thanks for the support so far.

Trunkvv commented 1 year ago

okay, got back on this one. So I'm trying to read back the framebuffer and write it to an image on disk. Hopefully it will tell me if it's an issue with the rendering or the presentation.

I've copied the image to a buffer with vkCmdCopyImageToBuffer. No errors so far. But now I've come to a halt where I can't access a single byte of data on that buffer. Could make sense. So for my learning curve, is this making sense and how to overcome it?

ByteBuffer dataAsByteBufferVulkan = pBufferAsPointerBuffer.getByteBuffer(0, 1); // JVM crashes with EXCEPTION_ACCESS_VIOLATION: dataAsByteBufferVulkan.get();

ws909 commented 1 year ago

@Trunkvv

Use this as a reference:

VkDevice device = ...;
VkBufferCreateInfo creationInfo = ...;
long bufferSize = creationInfo.size();

final var contents = new float[] { ... };

try (final var stack = MemoryStack.stackPush()) {
    final var longPointer = stack.mallocLong(1);
    final var pointer = stack.mallocPointer(1);

    var resultCode = vkCreateBuffer(device, creationInfo, null, longPointer);
    if (resultCode != VK_SUCCESS) {
        throw new AssertionError(resultCode);
    }

    final var buffer = longPointer.get(0);

    final int requiredMemoryTypes = VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT;

    final var memoryRequirements = VkMemoryRequirements.malloc(stack);
    vkGetBufferMemoryRequirements(device, buffer, memoryRequirements);

    final var memoryProperties = VkPhysicalDeviceMemoryProperties.malloc(stack);
    vkGetPhysicalDeviceMemoryProperties(device.getPhysicalDevice(), memoryProperties);

    var memoryTypeIndex = -1;
    for (var i = 0; i < memoryProperties.memoryTypeCount(); ++i) {
        if (
                (memoryRequirements.memoryTypeBits() & (1 << i)) != 0 &&
                (memoryProperties.memoryTypes(i).propertyFlags() & requiredMemoryTypes) == requiredMemoryTypes
        ) {
            memoryTypeIndex = i;
            break;
        }
    }

    if (memoryTypeIndex == -1) {
        throw new RuntimeException();
    }

    final var allocationInfo = VkMemoryAllocateInfo.malloc(stack)
            .sType(VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO)
            .pNext(VK_NULL_HANDLE)
            .allocationSize(memoryRequirements.size())
            .memoryTypeIndex(memoryTypeIndex);

    resultCode = vkAllocateMemory(device, allocationInfo, null, longPointer);
    if (resultCode != VK_SUCCESS) {
        throw new AssertionError(resultCode);
    }

    final long memory = longPointer.get(0);

    vkBindBufferMemory(device, buffer, memory, 0);

    resultCode = vkMapMemory(device, memory, 0, bufferSize, 0, pointer);
    if (resultCode != VK_SUCCESS) {
        throw new AssertionError(resultCode);
    }

    final long data = pointer.get(0);

    final var dataLink = MemoryUtil.memFloatBuffer(data, contents.length);
    dataLink.put(0, contents, 0, contents.length);

    vkUnmapMemory(device, memory);

    // ...

    vkFreeMemory(device, memory, null);
    vkDestroyBuffer(device, buffer, null);
}
Trunkvv commented 1 year ago

@ws909 thx! Adding a mapping to the memory (vkMapMemory) helped out a lot. Finally I'm able to write it to file now! (Although the RGB channels of the transitioned frame buffer in VK_FORMAT_R8G8B8_SINT are not aligned correct with what stbi_write_png is expecting. But hey, I can recognize the image in the window, so now I can test it on the RX6600.

We will see tomorrow whether the issue lies in presentation or before.

Trunkvv commented 1 year ago

okay, just tested on the RX6600. Just as on the RX570 I still can recognize the rendered image on disk. But the shown presented window is all black!

The presentation itself is a black box for me. What to look for? Because the chosen swap surface format looks an available format from the device to me....

Trunkvv commented 1 year ago

Well, found a workaround! I was looking at the difference of the working code: https://github.com/LWJGL/lwjgl3-demos/blob/main/src/org/lwjgl/demo/vulkan/ColoredTriangleDemo.java and not working on specifically my RX6600: https://github.com/Naitsirc98/Vulkan-Tutorial-Java/blob/master/src/main/java/javavulkantutorial/Ch15HelloTriangle.java

ColoredTriangleDemo (lwjgl3-demos) is using the same queue family for graphics and presenting. Ch15HelloTriangle (Vulkan-Tutorial-Java) allowed different families, and ended up select for present a different one with my RX6600. When I forced it to select the same as the graphics queue family (and same as on the working demo), the issue with a presented black screen vanished!

So I don't know why it's presenting a black screen, but I made it disappear....

Trunkvv commented 1 year ago

Anyone thinks it needs extra investigation?

ws909 commented 1 year ago

@Trunkvv The Vulkan specification allows different queue families for presentation and graphics, so if the example code fails in any way when these are not the same, you have either discovered a bug in the example code itself, or in the driver, or in the hardware.

If the queue families are not the same, the program must synchronize between them. If the example code does not do that, that's likely what's causing the black screen. It seems to have been a belief for many years that hardware supporting different presentation and graphics queue families don't exist, so you could look more into this as well.

Trunkvv commented 1 year ago

Well, the RX570 does work with different graphics and presentation queue. The author does not have issues on the NV platform as well. I’ll leave it with that, it will indeed will have an issue in either the code, driver or hardware. It took me an awful lot of time to come to this point having a workaround to set the queues the same.

Thank you for your support during this investigation.

ws909 commented 1 year ago

@Trunkvv Maybe this issue should be reported to AMD as a bug in their driver or hardware?

Trunkvv commented 1 year ago

@ws909 Thanks, I've reported the issue with AMD.