KhronosGroup / VK-GL-CTS

Khronos Vulkan, OpenGL, and OpenGL ES Conformance Tests
https://www.khronos.org/
Apache License 2.0
521 stars 290 forks source link

Multiple test cases run together get stuck, but the last test case run alone succeeds #437

Open Truthonlyone opened 8 months ago

Truthonlyone commented 8 months ago

Hardware environment: arm64 Operating system: ubuntu 20.04 Vulkan driver: powerVR

Hey bro,

When I run the Vulkan Conformance Test Suite (CTS), I observe that multiple test cases run together get stuck, but the last test case run alone succeeds. (The reason only to run the last one is that the last test case is most likely to cause stuck.)

Is it possible that some of the test cases share some state or resources that interfere with each other when run together, but not when run alone? For example, some test cases may use the same buffer, image, or device memory, and cause conflicts or errors when accessed concurrently or sequentially by different tests. This is my guess, and I wonder if they are true?

I'll be very happy if I get any reply.

awalters-vk-img commented 8 months ago

Could you give me a bit more info about the platform and driver version you are using? It's unlikely to be a problem with VK-GL-CTS and is more likely an issue on the driver side.

Truthonlyone commented 8 months ago

Could you give me a bit more info about the platform and driver version you are using? It's unlikely to be a problem with VK-GL-CTS and is more likely an issue on the driver side.

Hi, sorry to reply late!

The vulkan driver you can find on github of mesa repository, the branch is 22.1. I modified it to make it compatible with our own gpus and to practise the development of vulkan driver.

So.., you don't think multiple test cases in VK-GL-CTS will affect each other when run together, right?

Then, how to explain the fact that multiple test cases run together get stuck, but the last test case run alone succeeds?

awalters-vk-img commented 8 months ago

When running multiple VK-GL-CTS processes in parallel the work from each process needs to be properly scheduled in the GPU so they can all make forward progress and complete, in this case it is likely there is a bug causing a stall in scheduling the work, or an issue with completion fences not being signalled. I'll forward this issue to our relevant engineering team, though our open source driver so far has only be developed for a small set of gpu cores (https://docs.mesa3d.org/drivers/powervr.html) so it could be that support for your particular gpu core isn't present or complete. It is probably worth concentrating on single process support first and foremost.

frankbinns commented 8 months ago

The Vulkan driver in Mesa 22.1 was still very early on in its development and was only a partial implementation of the Vulkan 1.0 API. The only application we'd run at that point was the Sascha Willems triangle demo on an Acer Chromebook, which contains a PowerVR GX6250.

We started looking at Vulkan CTS early last year and the driver has changed significantly since then. It's now at a point where we can successfully complete Vulkan CTS 1.3.4.1 in our local runs on a TI AM62 SK (IMG AXE-1-16M).

You can find the latest Mesa code here, which we're in the process of upstreaming: https://gitlab.freedesktop.org/imagination/mesa/-/tree/powervr-mesa-next

The Linux kernel driver is expected to appear in the 6.8 kernel. You can find it in the drm-misc tree while it makes its way upstream: https://cgit.freedesktop.org/drm/drm-misc/tree/

The firmware for the AXE-1-16M can be found in the linux-firmware repository. Firmware for additional GPUs can be found here: https://gitlab.freedesktop.org/imagination/linux-firmware

As @awalters-vk-img mentioned, the list of currently supported GPUs can be found here. We're also releasing firmware upon request for GPUs we don't support yet for those that want to experiment with and contribute to the driver. You can request firmware by opening an issue on the linux-firmware repository above.

Truthonlyone commented 8 months ago

When running multiple VK-GL-CTS processes in parallel the work from each process needs to be properly scheduled in the GPU so they can all make forward progress and complete, in this case it is likely there is a bug causing a stall in scheduling the work, or an issue with completion fences not being signalled. I'll forward this issue to our relevant engineering team, though our open source driver so far has only be developed for a small set of gpu cores (https://docs.mesa3d.org/drivers/powervr.html) so it could be that support for your particular gpu core isn't present or complete. It is probably worth concentrating on single process support first and foremost.

Thank you for your hint as to the cause of this error!

To ensure the consistency of the information between us, I need to explain that I never run multiple VK-GL-CTS processes at once. What I mean by "multiple test cases running together" in my description is that VK-GL-CTS itself uses multiple threads, and the GPU is inherently running in parallel.

These two days, I write a script to make the deqp-vk execution run test cases one by one by force, and it runs very smoothly with no errors or stucks. While the test results show some cases passed, some failed, and some are unsupported. The reason may be that, as @frankbinns said, the Vulkan driver that I use only partially implements the Vulkan 1.0 API.

By the way, the error I mentioned above is "device out of memory", like something about the video memory. I guess it is because the GPU synchronization is not handled well, or the device memory management problem lead to the conflict of access to memory resources.

Anyway, maybe it's time to go with the latest Vulkan driver.

Truthonlyone commented 8 months ago

The Vulkan driver in Mesa 22.1 was still very early on in its development and was only a partial implementation of the Vulkan 1.0 API. The only application we'd run at that point was the Sascha Willems triangle demo on an Acer Chromebook, which contains a PowerVR GX6250.

We started looking at Vulkan CTS early last year and the driver has changed significantly since then. It's now at a point where we can successfully complete Vulkan CTS 1.3.4.1 in our local runs on a TI AM62 SK (IMG AXE-1-16M).

You can find the latest Mesa code here, which we're in the process of upstreaming: https://gitlab.freedesktop.org/imagination/mesa/-/tree/powervr-mesa-next

The Linux kernel driver is expected to appear in the 6.8 kernel. You can find it in the drm-misc tree while it makes its way upstream: https://cgit.freedesktop.org/drm/drm-misc/tree/

The firmware for the AXE-1-16M can be found in the linux-firmware repository. Firmware for additional GPUs can be found here: https://gitlab.freedesktop.org/imagination/linux-firmware

As @awalters-vk-img mentioned, the list of currently supported GPUs can be found here. We're also releasing firmware upon request for GPUs we don't support yet for those that want to experiment with and contribute to the driver. You can request firmware by opening an issue on the linux-firmware repository above.

Thank you for kindly listing this useful information! It saved me a lot of time, but perhaps the next step is the more challenging job haha...