SaschaWillems / vulkan.gpuinfo.org

Front-End and Back-End for the Vulkan Hardware Database
https://vulkan.gpuinfo.org
GNU Affero General Public License v3.0
23 stars 4 forks source link

Organize and de-duplicate devices using RADV #61

Open Venemo opened 1 year ago

Venemo commented 1 year ago

We discussed this with @christophe-lunarg at Vulkanised 2023.

If you look at the list of reports from RADV, there are a bunch of duplicate device names. This is unfortunately because we changed how the device name is reported in RADV. The old scheme was: AMD RADV <codename> and the new scheme is: AMD <marketing name> (RADV <codename>) - additionally the codenames also change sometimes.

As a result, old driver versions show up as a distinct device in the list which is misleading because the reader may (mistakenly) assume that the latest version of RADV is 1-2 years old, which is not true. Furthermore, Null device and OVERRIDDEN are not actually real devices and also misleading, I believe these should not show up at all.

Potential suggestions to improve:

Venemo commented 1 year ago

The site has the same problem with NVidia devices too:

christophe-lunarg commented 1 year ago

We also searched a possible solution for this issue that sometime IHVs are really into device renaming!

What about when it’s available use https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VkPhysicalDeviceProperties.html and the deviceID that should be unique (Probably that needs more investigation)

This way we could track the history of a device more easily.

What do you think for this Sascha?

It was great to meet you Timur, glad you exposed this issue as I came across it in the past but never took the time to really understand what’s going on!

On Thu 9. Feb 2023 at 14:28, Timur Kristóf @.***> wrote:

The site has the same problem with NVidia devices too:

  • These are the same:
    • GTX 1070
    • GeForce GTX 1070
    • NVidia GeForce GTX 1070

— Reply to this email directly, view it on GitHub https://github.com/SaschaWillems/vulkan.gpuinfo.org/issues/61#issuecomment-1424193561, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO7ZXKIPU2XZPEW5Y4UAG4TWWTWJLANCNFSM6AAAAAAUWRPYLQ . You are receiving this because you were mentioned.Message ID: @.***>

--

Venemo commented 1 year ago

Unfortunately, using VkPhysicalDeviceProperties::deviceID is probably still not a complete solution without extra hacks. In case of RADV the deviceID is set to the PCI ID of the GPU, but sometimes the same GPU may have more than one PCI ID, for example here is the list for AMD GPUs and unfortunately this file can't always be used for de-duplicating these entries because sometimes they sell completely different devices with the same generic marketing name (especially iGPUs seem to have the same name regardless of HW generation).

SaschaWillems commented 1 year ago

Thanks for bringing this up. It's actually one of my biggest pain points with the database, and I have wasted countless hours of trying to fix this. While on desktop there is chance to fix things, everything falls apart when looking at Android, where things like device ids are seemingly arbitrarily used. After my last (failed) attempt I was thinking of dropping all old reports and maybe starting with a new database structure. But that again would be problematic on Android where devices rarely are updated.

So to sum it up: It's sadly heavily complicated :/

SaschaWillems commented 1 year ago

Just to clarify how all over the place deviceids are, there are more than 360 (!) VkPhysicalDeviceProperties::deviceID with more than 1 distinct device names. Some reports even report a zero for it. This make it pretty much useless.

SaschaWillems commented 1 year ago

VkPhysicalDeviceDriverPropertiesKHR is also not an option as it is not available everywhere. Using multiple sources to distinguish devices would add a lot of complexity to the database and would come with performance implications.

SaschaWillems commented 1 year ago

My stance on this is that the Vulkan WG should force IHVs to add a proper way to distinguish devices. Everything else would move the problem into the hands of other people like me. I'm constantly fixing and cleaning up things that should be coming in a proper way from the driver.

Venemo commented 1 year ago

That's a respectable position and I see how that can easily cause a headache. I don't have a good suggestion how to deal with Android. It shouldn't be too hard to deal with it on the desktop but I completely understand if you don't want to deal with that either.

Specifically for the open source Mesa drivers, I think it would be prefectly fine to delete reports from versions older than 6 months, and don't save new reports from these versions. This can be easily determined by the version number ― major version is year, minor version is quarter. Maybe also consider adding to the GUI that warns users that they are using an unsupported driver version.

christophe-lunarg commented 1 year ago

I think it would be prefectly fine to delete reports from versions older than 6 months

This proposal is really not reasonnable. It's necessary for a Vulkan application developer to target a large amont of devices for the application to have any form of relevance. I understand it's missy and there is a lot of figuring out to do with the history of devices which may led to huge problem but not having information about what a driver and device support is a far greater problem for a Vulkan application developers, specifically to evaluate the requirements that the application can or can't afford.

SaschaWillems commented 1 year ago

I agree with @christophe-lunarg. I don't think it's the responsibility of the database to let users know of unsupported drivers. Esp. as to what unsupported actually means. Take e.g. nvidia Vulkan developer drivers. They may be newer with more features than the production drivers, but with a lower version number. Would that count as a supported driver or not? And on Android it's (sadly once again) a complete mess.

I also don't want to delete anything from the database, as the historical data is IMO very valuable, e.g. to find out about extension or feature adoption. And once again, on Android drivers rarely get updated, so finding a cut-off point would be nearly impossible.

I may do some soft cut-off at some point though and maybe mark old reports as deprecated and then maybe add some setting to the database that'll only show most recent repots. Similar to what I added for the Vulkan min. version selection.

But I'm glad about this discussion. Hopefully it'll lead to improvements to the database :)

christophe-lunarg commented 1 year ago

VkPhysicalDeviceDriverPropertiesKHR is also not an option as it is not available everywhere. Using multiple sources to distinguish devices would add a lot of complexity to the database and would come with performance implications.

I understand that this solution requires an extension or Vulkan 1.2 and implies performance implications however I think it provide a solution to trace the evolution of drivers... Personnally when I use GPUinfo.org, I browse the reports in multiple ways and fully leverages all the various angle the database is queried. I don't expect the database will ever be really clean.

This said, I recognize the huge task, so maybe see it as a feature request : what about creating a dedicated page like the "Devices" page (listdevices.php) that would be the "Drivers" page and that would display and "stack" only the reports with VkPhysicalDeviceDriverPropertiesKHR allowing to follow the evolution of devices name? All the devices without this structure are just not in this page...

On desktop this solution start to be relevant for a lot of Vulkan developers because everything (TM) supports Vulkan 1.2.

Venemo commented 1 year ago

This proposal is really not reasonnable. It's necessary for a Vulkan application developer to target a large amont of devices for the application to have any form of relevance.

The main issue here is these old versions sometimes appear as "the latest version" in the database, which would mislead app developers into thinking that we haven't released a new driver in the past few years. I also don't think anyone wants to target old versions of Linux drivers.

christophe-lunarg commented 1 year ago

Is there are old drivers report being published, it shows that people are using machines with old drivers so an application developer has to support that drivers or ask the user to update its system which is not necessarily something that can do or would do. At least, we can't decide from them.

Venemo commented 1 year ago

That's all fine, but I would prefer at least not to show those as the latest.

SaschaWillems commented 1 year ago

This said, I recognize the huge task, so maybe see it as a feature request : what about creating a dedicated page like the "Devices" page (listdevices.php) that would be the "Drivers" page and that would display and "stack" only the reports with VkPhysicalDeviceDriverPropertiesKHR allowing to follow the evolution of devices name? All the devices without this structure are just not in this page...

That is actually a good idea. I'll try to come up with a new listing that is based on data available from VkPhysicalDeviceDriverPropertiesKHR.

SaschaWillems commented 1 year ago

I just took a quick look at what data is available via VK_KHR_driver_properties and tbh it doesn't look helpful at all. The information provided by this ext does not include a device name or unique identifier, and only identifies the driver. So a listing with this would be pretty useless IMO.

AMD 7900 XTX:

conformanceVersion | 1.3.0.0 -- | -- driverID | AMD (Proprietary) driverInfo | 23.2.2 (LLPC) driverName | AMD proprietary driver

NV RTX 2060:

conformanceVersion | 1.3.5.0 -- | -- driverID | NVIDIA (Proprietary) driverInfo | 528.75 driverName | NVIDIA

Looking at the whole situation I'm not sure at all how to fix this in any non-manual way. I checked e.g. deviceids for an RTX 2060 and just the base 2060 is reported with four different deviceIds.

So right now I think the only viable way would be to do some mapping table to unify at least some of the device names. That's some work though, but I'll try to take a shot at it.

christophe-lunarg commented 1 year ago

Ah, right. :/

Venemo commented 1 year ago

So right now I think the only viable way would be to do some mapping table to unify at least some of the device names. That's some work though, but I'll try to take a shot at it.

I can help with that work, at least the RADV parts of it.

SaschaWillems commented 1 year ago

Thanks for your offer. I'll get back to that once I've got the basic stuff done ;)

As a first step I added an aliasing table that'll be the base for all database queries that filter at device name level. You can check out that list here: https://vulkan.gpuinfo.org/listdevicealiases.php

Currently the listing only contains desktop devices name, but already has more than 1,200 entries oO

I'm currently adding a few aliases so I can update the back-end to use these instead of the device names. But from my first initial tests this aliasing table seems like a viable way of doing things. Sure, it requires manual work, but I'll adjust all database queries in such a way that they'll just display unaliased device names if no alias is present in that table. That takes away any pressure to update aliases as soon as new devices popup.

And I can even imagine using that alias table as some additional unique device name collection table in the future ;)