ROCm / Tensile

Stretching GPU performance for GEMMs and tensor contractions.
MIT License
208 stars 142 forks source link

Restrict XCC mapping to gfx942 #1959

Closed AlexBrownAMD closed 1 month ago

AlexBrownAMD commented 1 month ago

The feature currently only applies to gfx942 and is untested on older architectures. Add reject case for other architectures for now until the feature is fully tested on those devices.

nakajee commented 1 month ago

Does sk test pass on 90a with this change? I thought it would generate an error if no kernel was generated.

TorreZuk commented 1 month ago

I think you were trying to fix extended, right? So I added the label.

ellosel commented 1 month ago

Does sk test pass on 90a with this change? I thought it would generate an error if no kernel was generated.

90a tests are in progress.

nakajee commented 1 month ago

I am thinking it might better to just skip the sk test cases for now. We need to change the code again when we have new architectures. We can put some comment in common.py to let users know this is verified on gfx942 only.

AlexBrownAMD commented 1 month ago

Does sk test pass on 90a with this change? I thought it would generate an error if no kernel was generated.

Yes it passes on 90a with this change. The test files you mentioned include both test cases for xccmapping=0 and 8. Tests with xccmapping enabled hang on gfx90a. Though it doesn't make sense to use the feature on 90a, it's possible the hang could also happen in other configurations as well so I've created a ticket to debug the issue. For now, the feature works in standard use-cases for 942 so this change restricts it to that until more testing is done.

bstefanuk commented 1 month ago

Looks good, makes sense.

+1 Koji's comment, a config-level system to filter capabilities (e.g. Stream-K) for different architectures would be nice. It also speaks to our goal of extending support to a wider set of architectures.

AlexBrownAMD commented 1 month ago

Looks good, makes sense.

+1 Koji's comment, a config-level system to filter capabilities (e.g. Stream-K) for different architectures would be nice. It also speaks to our goal of extending support to a wider set of architectures.

There is a system for this already. We can filter features based on the "AsmCaps" which poll for capabilities available on a given compiler+architecture. This system is preferred over checking for a specific architecture number, because sometimes features are not necessarily architecture specific but based on rocm versions. Also some architectures can have multiple variants with different capabilities.

For this change, the goal is to have all stream-k features working on all hardware, so this change is temporary (and hopefully to be improved soon). Also, I couldn't think of a specific asm cap that related to this feature in a way that made sense, so just wrote it to check the ISA for now.