We're having an issue where the GPU code slows down significantly when using > 3 MPI tasks per GPU. This issue can be reproduced with a much small miniWeather code, so I'm putting that here. We have Nvidia looking into it. I just wanted it documented here. The workaround for now is going to be threading the physics and then running the CRM code in the master thread on the GPU.
We're having an issue where the GPU code slows down significantly when using > 3 MPI tasks per GPU. This issue can be reproduced with a much small miniWeather code, so I'm putting that here. We have Nvidia looking into it. I just wanted it documented here. The workaround for now is going to be threading the physics and then running the CRM code in the master thread on the GPU.