LLNL / ExaCMech

BSD 3-Clause "New" or "Revised" License
17 stars 0 forks source link

HIP Backend Support #5

Closed rcarson3 closed 11 months ago

rcarson3 commented 4 years ago

Draft proposal to add in HIP backend support to ExaCMech

rcarson3 commented 3 years ago

All of the examples should now pass when compiled with a HIP compiler. I'll work on porting the miniapp over next.

rcarson3 commented 3 years ago

As of https://github.com/LLNL/ExaCMech/pull/5/commits/0cfb7bfb2cc58a99c8ea120e42d257711b6e8c40 the miniapp should now work with HIP / ROCm 4.2 as built on Spock and using RAJA v0.14. I will note that a few additional compiler options were needed to get to this point. One of them is that do need to make use of C++14 in-order to properly compile everything.

rcarson3 commented 2 years ago

Also, apparently if we do a HIP build we will no longer be able to do an openmp+device build of things based on the below similar issue...

https://github.com/LLNL/RAJA/issues/976

rcarson3 commented 1 year ago

In order to drastically reduce the amount of repeat code we have for each forall implementation, I'm going to look into replacing all these forall loops with either the RAJA::launch or RAJA::expt::dynamic_forall feature (https://github.com/LLNL/RAJA/pull/1280). This will push the minimum RAJA version up to v2022.10.x, but I don't see an issue with that if that means we'll have less code to worry about. Although, if we do move towards the launch syntax that might cause us to lose out on the ability to use OpenMP with the host and GPU codes. However, I don't necessarily see that as much of an issue now given my experience over the years running on GPU systems. Since, you almost always just run on the GPU or CPU. Outside of benchmarking, I can't remember the last time I used OpenMP on a GPU machine...

It might take just a bit to get this branch refactored to that as I need to keep this branch stable at least until I run some large runs on Frontier.

Also before merging this, I need to simplify all the CUDA vs HIP logic in terms of execution strategy and the cuda portability header and just have things referred to as GPU.