E3SM-Project / Omega

Next generation ocean model within E3SM
https://docs.e3sm.org/Omega/omega
Other
4 stars 5 forks source link

Ctests on Perlmutter-GPU with nvidiagpu running much more slowly than on Perlmutter-CPU with nvidia #165

Open altheaden opened 1 week ago

altheaden commented 1 week ago

@xylar and I noticed that ctests on pm-cpu with nvidia ran at about 90 seconds, but on pm-gpu with nvidiagpu they ran at 1538.76 seconds.

xylar commented 1 week ago

This seems qualitatively similar to the slow Intel performance we've seen: https://github.com/E3SM-Project/Omega/issues/119 But this case seems even worse. 25 minutes instead of 1.5 is really quite bad!

xylar commented 1 week ago

It could be that Polaris or Omega itself isn't setting some environment variables that are needed or something like that.

xylar commented 1 week ago

I ran the tests myself and they didn't take that long. They're still quite slow but only took 577.79 seconds (about 10 minutes)