ROCm / Tensile

Stretching GPU performance for GEMMs and tensor contractions.
MIT License
208 stars 144 forks source link

Add architecture management functions to TensileCreateLibrary #1926

Closed bstefanuk closed 2 months ago

bstefanuk commented 2 months ago

Objectives:

Outcomes:

Testing:

Unit testing:

Integration testing:

Notable changes:

Docker environment info:

$ cat /etc/os-release | head -1
PRETTY_NAME="Ubuntu 22.04.3 LTS"
$ apt show rocm-libs -a | head -2 | tail +2
Version: 6.1.2.60102-119~22.04
$ uname -r
5.15.0-86-generic
cgmb commented 2 months ago
  • supportedArchitectures raises a ValueError if an unsupported architecture or malformed input is found.

My question is perhaps beyond the scope of this PR, but is this necessary? There are source kernels to fall back to, right? It is useful to be able to build Tensile for unsupported architectures, even if the performance is poor. It's much easier to get started with a new GPU architecture if you can build every library with a new target id without having to patch the library. That gets you through the first step of, "Make it work. Make it right. Make it fast," using only build flags.

ellosel commented 2 months ago
  • supportedArchitectures raises a ValueError if an unsupported architecture or malformed input is found.

My question is perhaps beyond the scope of this PR, but is this necessary? There are source kernels to fall back to, right? It is useful to be able to build Tensile for unsupported architectures, even if the performance is poor. It's much easier to get started with a new GPU architecture if you can build every library with a new target id without having to patch the library. That gets you through the first step of, "Make it work. Make it right. Make it fast," using only build flags.

Previously we would call printExit if an architecture was unsupported. This attempts to more idiomatically support that behavior and give users the opportunity to recover but you are suggesting maybe we don't want that behavior at all. @yoichiyoshida thoughts?

bstefanuk commented 2 months ago

Great point @cgmb. As @ellosel mentioned, this behaviour replaces a hard program exit, so the effective behaviour is unchanged, see L1212.

Perhaps the confusion here is actually in the usage of the term "supported architecture" which may be better replaced by "unrecognized architecture". These functions search a dictionary defined in Common.py which maps Gfx names to architecture names, e.g., aldebaran. The exception is thus only raised if a Gfx architecture is passed that isn't in the map, e.g., gfxXYZ.

I have updated the wording of the PR description.

cgmb commented 2 months ago

Previously we would call printExit if an architecture was unsupported. This attempts to more idiomatically support that behavior and give users the opportunity to recover but you are suggesting maybe we don't want that behavior at all.

That is correct.

These functions search a dictionary defined in Common.py which maps Gfx names to architecture names, e.g., aldebaran. The exception is thus only raised if a Gfx architecture is passed that isn't in the map, e.g., gfxXYZ.

Is gfxXYZ the gfxip or the ISA? While they've usually been the same over the past few years, these can be different and the distinction may be important.

The mapping seems a bit weird at a conceptual level, since there has never been a 1-to-1 mapping between GFX names and architecture names. On a practical level, I also wonder if it could just map gfxXYZ to gfxXYZ in the case that there's no architecture name hardcoded for that input.

this behaviour replaces a hard program exit, so the effective behaviour is unchanged

I understand this is not a regression, but I'd thought we could improve the behaviour if we're changing it anyway. Although, upon further reflection, it is perhaps out-of-scope for this PR as it is not really changing.

bstefanuk commented 2 months ago

The mapping seems a bit weird at a conceptual level, since there has never been a 1-to-1 mapping between GFX names and architecture names.

I see what you mean and think it will be important to continue to discuss these decisions. In terms of scope, our focus at the moment is to refactor TensileCreateLibrary so that we can form a holistic view of its design and make changes of the sort you propose, in the future, while understanding how these changes will impact the project.

From a programmatic view, the Gfx architectures used here are mapped to their arch names, which are then interpreted as substring matchers for filtering which logic files are included in the generated library. I see a potential future solution where we scrap the Gfx archs as CLI parameters and just use the intended names directly, e.g., aldebaran.

bstefanuk commented 2 months ago

@cgmb I have created a ticket to review the usage of the --architecture option for TensileCreateLibrary, and in Tensile at-large.