Closed Lunderberg closed 1 month ago
@mbrookhart Regarding your comments that several of the failing unit tests had run correctly on vulkan in the past, the main breaking point was in #8127, which reads the device parameters from the physical device when the target is "vulkan -from_device=0"
. Several of the unit tests had a hard-coded target of "vulkan"
, tried to run with the minimum vulkan capabilities, and failed at codegen because the capability requested (e.g. 64-bit float support) wasn't listed in the target. Those fixes came along for free by parametrizing the topi tests, since the default vulkan test target uses the device query.
That said, at some point I want to ensure all tests either run correctly or have an appropriate xfail
for the minimum vulkan feature set, but that will be a different issue.
This result is on a NV driver, or do they also fail on AMD?
Thank you for checking, and all except the test_conv1d_transpose_ncw
occur on AMD as well. It's the only one that is a numerical failure, while the rest of errors that occur during codegen. I'll update the table with that information.
Following #8947 , added the failing relay tests to the tracking issue.
@Lunderberg Are these two test cases any different? One has pytest.xfail("Known failing test for vulkan")
but not for the other.
Thank you for that catch. When refactoring the tests in #8947, I added the updated version of test_conv2d_run
, but didn't remove the original. I have https://github.com/apache/tvm/pull/8993 open to remove the redundant test_conv2d_run
, and have double-checked that there aren't any others that snuck in.
@Lunderberg The last three items in test_any.py
are not specific to vulkan (fails on cuda as well), so I think we should drop them from the list.
They don't work on gpu targets since we don't support dynamic height or width in conv2d, for example.
Summary
Currently, some unit tests fail when running on the Vulkan runtime. PRs https://github.com/apache/tvm/pull/8903 and https://github.com/apache/tvm/pull/8947 parametrized the tests that are currently failing, so that the vulkan target can be marked as xfail without impacting any other runtimes. The Vulkan runtime should be improved so that these unit tests can pass on vulkan as well.
Status