[meta] External data sets for testing

kvark commented 3 years ago

It would be useful to have some place where we can store the bigger sets of shaders (SPIR-V, WGSL, GLSL, whatever). We'd then have a Github Action to fetch them and parse/validate. Since this would be a heavy action, we'd run it either manually, or on tag creation (seems most practical).

Here some info about SPIR-V corpus:

vulkan CTS has 750K of them lots of them in SPIRV-Tools test suite accumulated over years When you build vulkan CTS, you can run external/vulkan/modules/vk-build-programs -v to build all the shaders and run validation on them. You can hack that flow to dump; I think there's a flow to dedup them and save them in a binary database of some kind but I never looked deeply at that. But they're not very diverse. About 99% of them are generated from Glslang, so there's a monoculture problem. The other 1% are generated from templated SPIR-V assembly text. And recently there are a few hundred harder cases found through spirv-fuzz; using tech evolved from GraphicsFuzz folks All the .amber scripts in Vulkan CTS are under https://github.com/KhronosGroup/VK-GL-CTS/tree/master/external/vulkancts/data/vulkan/amber with a subdir for graphicsfuzz

Gordon-F commented 3 years ago

Sascha Willems Vulkan examples have useful (and easier to integrate) shaders test set. It should help to test spv-in and all naga backends. I don't think we should store any kind of snapshots in the naga repository with an external test set. Just be sure that the naga frontend and backend can handle them.

kvark commented 3 years ago

Dota2 shader set was added in 47ada8182b83d43f47435df76d68f7dd278ab57a

kvark commented 3 years ago

David's sample set - https://github.com/dneto0/spirv-samples

Gordon-F commented 3 years ago

David's sample set - https://github.com/dneto0/spirv-samples

@kvark Since we already have 2 external datasets, adding one more seems pretty straightforward task. We should just parse this external data set or produce valid wgsl code from it? Fail CI if something goes wrong or not?

kvark commented 3 years ago

@Gordon-F I thought we just want to parse them into IR and validate. But generating WGSL is in line with our testing strategy, so it's also good.

Gordon-F commented 3 years ago

Tint has a lot of tests - https://dawn.googlesource.com/tint/+/refs/heads/main/test/

We can run them as a lazy task or copy them to the naga repository (tint under Apache 2.0 license).

gfx-rs / wgpu

[meta] External data sets for testing #4326