LLNL / axom

CS infrastructure components for HPC applications
BSD 3-Clause "New" or "Revised" License
158 stars 27 forks source link

Issue with ROCm 6.1.2/6.2.1 spack HIP link flags #1461

Open ebchin opened 3 weeks ago

ebchin commented 3 weeks ago

Hi axom team,

I'm updating Tribol's TPLs to build with ROCm 6.1.2 and 6.2.1, and I had a few questions/issues:

  1. https://github.com/LLNL/axom/blob/develop/scripts/spack/configs/toss_4_x86_64_ib_cray/spack.yaml#L163-L166: It looks like the hip subdirectory doesn't exist anymore. Should these directories be updated to e.g. /opt/rocm-6.#.#?
  2. https://github.com/LLNL/axom/blob/develop/scripts/spack/packages/axom/package.py#L328: On rzvernal, this was giving the directory /opt. It looks like rocm_root = spec["hip"].prefix might give the intended directory.
  3. https://github.com/LLNL/axom/blob/develop/scripts/spack/packages/axom/package.py#L358: The lib64 subdirectory doesn't exist in /opt/rocm-6.1.2 or /opt/rocm-6.2.1 (or older ROCm versions, for that matter). I think the intended directory here is lib?

When the above changes were made, the correct library paths were produced by hip_link_flags.

@chapman39

bmhan12 commented 3 weeks ago

Hi @ebchin ,

I'm updating Tribol's TPLs...

Interestingly enough, I was using Tribol's current TPL configuration as a reference to update Axom's TPLs 😅

1) You're right, hip subdirectory looks to be deprecated from rocm 6 onwards. I copied the path from the shared RADIUSS spack configs set up without much thought, which still uses the deprecated path: https://github.com/LLNL/radiuss-spack-configs/blob/522da3898fcc8942f8eb3270f01c8212937589f5/toss_4_x86_64_ib_cray/spack.yaml#L361-L386

Not sure if this means specifying hip is still necessary or not for spack, or what the new path should be for rocm 6 onwards.

2) I see why you were getting /opt. rocm_root = os.path.dirname(spec["hip"].prefix) gets the parent directory of the hip dependency. So in Axom currently, that means /opt/rocm6.2.1/hip --> /opt/rocm6.2.1, and with the suggested hip path you gave /opt/rocm6.2.1 --> /opt

3) Good catch! Didn't realize these were do-nothing flags. I will look to remove these.

bmhan12 commented 3 weeks ago

Leaning towards changing hip to be /opt/rocm-6.#.# as you suggested. Frontier looks to be using /opt/rocm-6.#.# as well: https://github.com/E4S-Project/facility-external-spack-configs/blob/f8a3f4159bec5d556d6eb866328724a5790401ae/frontier/rocm-6.2.0/packages.yaml#L136-L148

ebchin commented 3 weeks ago

Thanks for taking a look, @bmhan12! Yes, that all makes sense. Whatever you end up deciding will likely make its way back to Tribol eventually 😄.

The addition of these linker flags is pretty common in rocm-supporting packages. Would it make sense to move them to BLT or somewhere else @white238 @chapman39 ?

bmhan12 commented 3 weeks ago

No problem!

I do want to mention something that might trip others up, depending on if you are using the radiuss-spack-configs for certain spack recipes and want to move to using the correct /opt/rocm-6.#.# path right away.

The current radiuss-spack-configs logic assumes the old hip path /opt/rocm-#.#.#/hip to set the ROCM_ROOT_DIR: https://github.com/LLNL/radiuss-spack-configs/blob/efe7a45205f0c71563c909e26c2f0a54a4c9427a/packages/camp/package.py#L23-L35

So with the current radiuss-spack-configs, you would get the wrong path and possibly get an error when building the RAJA suite with /opt/rocm-6.#.#.

So will have to hold off on moving to /opt/rocm-6.#.# for hip until radiuss-spack-configs gets updated.