Closed gdicker1 closed 1 week ago
NOTE: this PR is currently based on ewm-2.3.006 and doesn't include MPAS-O GPU changes. I will try to test this soon. Comments about CHAOS2000dev results are based on testing ewm-2.3.006 with and without these changes.
Thanks @gdicker1 ! Have you looked at the output from the simulations that work? Do the results look ok (ie. winds, temp, precip, ... )?
I can try to "nice-ify" some plots I've been using for FKESSLER - but it's looking good for the 10 day shapes expected for PS and PRECL. Testing these changes on GPUs with ewm-2.3.006 as the reference, 16 of 46 fields in FKESSLER results had differences. These are really small, reported RMS diff order of 1E-11 or less.
However, the bits of CHAOS2000dev output I got from what MPAS reports in the atm.log file was significantly different. Again comparing GPU runs of ewm-2.3.006 before and after the changes. The first Dynamics timestep was close, but by the second timestep the values for w grew by an order of magnitude. By the time the model crashed (near step 9 of 723 total) the wind speeds were up to order 1E+4 m/s!
Diff output of atm.log files for GPU CHAOS2000dev ( -
are before these changes, +
are with these changes).
Dynamics timestep beginning at 0001-01-01_00:00:00
split dynamics-transport integration 3
- global min w: -0.532977342650665 k=20, -11.8727897681862 lat, 48.8477507551253 lon
- global max w: 0.936813054202280 k=12, -11.8727897681862 lat, 48.8477507551253 lon
+ global min w: -0.532977342650664 k=20, -11.8727897681862 lat, 48.8477507551253 lon
+ global max w: 0.936813054202281 k=12, -11.8727897681862 lat, 48.8477507551253 lon
global min u: -121.233376890228 k=31, 41.5343188711450 lat, 22.0229494786346 lon
global max u: 121.443401755564 k=31, 41.7109953744057 lat, 20.6364381525949 lon
global max wsp: 122.246987628045 k=32, 48.1141816080768 lat, -48.1174705715333 lon
Dynamics timestep beginning at 0001-01-01_00:10:00
split dynamics-transport integration 3
- global min w: -0.243261928818788 k=1, 62.5179304166098 lat, -41.8104465711868 lon
- global max w: 0.649617371375648 k=9, -23.3997038794449 lat, -49.6034465273612 lon
+ global min w: -2.96489971967807 k=16, 13.7005005787574 lat, -39.0800374693392 lon
+ global max w: 3.26741979552241 k=16, 13.8060457844544 lat, -40.2174552114431 lon
global min u: -121.367907039387 k=31, 41.5343188711450 lat, 22.0229494786346 lon
global max u: 120.505455476180 k=31, 41.7109953744057 lat, 20.6364381525949 lon
global max wsp: 121.396527039095 k=31, 41.5343188711450 lat, 22.0229494786346 lon
Yeah, sounds windy :) Have you tried an aqua planet simulation?
@sherimickelson I haven't tried QPC6 on GPU with this yet, thanks for the recommendation. Let me check
@sherimickelson I have some QPC6 results now. The 5 day 120km runs are able to complete but the results are still different without and with these changes - but not nearly as bad as CHAOS2000dev. The GPU QPC6 run matches previous CPU runs of the same setup.
There's differences showing up in the first timestep, but they aren't too far off by the end of the simulation. The windspeeds stay at reasonable levels but the vertical wind is ~30% different.
Here's the first two timesteps (-
is before changes, +
are with these changes)
Dynamics timestep beginning at 0001-01-01_00:00:00
split dynamics-transport integration 3
- global min w: -0.569874695463337E-03 k=25, 61.1306780150452 lat, 89.9622110678005 lon
- global max w: 0.639162191689875E-03 k=31, 0.111597480343166 lat, 179.957029458699 lon
- global min u: -0.156954147392225E-01 k=32, 1.26478550892069 lat, 90.5455502040546 lon
- global max u: 0.154730689635762E-01 k=32, 2.31208605528055 lat, -90.4982479563288 lon
- global max wsp: 0.157147887978483E-01 k=32, 1.26478550892069 lat, 90.5455502040546 lon
+ global min w: -0.569874695464211E-03 k=25, 61.1306780150452 lat, 89.9622110678005 lon
+ global max w: 0.639162191690247E-03 k=31, 0.111597480343166 lat, 179.957029458699 lon
+ global min u: -0.156954108558693E-01 k=32, 1.26478550892069 lat, 90.5455502040546 lon
+ global max u: 0.154730689635821E-01 k=32, 2.31208605528055 lat, -90.4982479563288 lon
+ global max wsp: 0.157147850376395E-01 k=32, 1.26478550892069 lat, 90.5455502040546 lon
Dynamics timestep beginning at 0001-01-01_00:10:00
split dynamics-transport integration 3
- global min w: -0.748333814512287E-03 k=25, -57.0928853286925 lat, 89.9720397100161 lon
- global max w: 0.859615298680314E-03 k=31, 0.111597480343166 lat, 179.957029458699 lon
- global min u: -0.472806857801587E-01 k=32, 1.26478550892069 lat, 90.5455502040546 lon
- global max u: 0.468073679259547E-01 k=32, 2.31208605528055 lat, -90.4982479563288 lon
- global max wsp: 0.473599303599451E-01 k=32, 1.26478550892069 lat, 90.5455502040546 lon
+ global min w: -0.748333814510844E-03 k=25, -57.0928853286925 lat, 89.9720397100161 lon
+ global max w: 0.859615298680350E-03 k=31, 0.111597480343166 lat, 179.957029458699 lon
+ global min u: -0.472801395455997E-01 k=32, 1.26478550892069 lat, 90.5455502040546 lon
+ global max u: 0.468073679259454E-01 k=32, 2.31208605528055 lat, -90.4982479563288 lon
+ global max wsp: 0.473593913678954E-01 k=32, 1.26478550892069 lat, 90.5455502040546 lon
and the last two timesteps
Dynamics timestep beginning at 0001-01-06_00:10:00
split dynamics-transport integration 3
- global min w: -0.826616854107040E-01 k=10, 0.348852685590768 lat, -152.471108786604 lon
- global max w: 0.425982877192859 k=10, -10.7269418650993 lat, -124.098325389453 lon
- global min u: -24.8533617966951 k=32, 55.1385052448688 lat, -67.9199921453411 lon
- global max u: 24.8534645904483 k=32, 55.1385052448688 lat, -65.9859011384469 lon
- global max wsp: 24.8724699647361 k=32, -55.4627445700531 lat, -67.8345328184459 lon
+ global min w: -0.853295582619985E-01 k=10, -0.525698102631314 lat, -151.929884596772 lon
+ global max w: 0.309764582742016 k=8, -0.178735631872707 lat, -87.1929554924195 lon
+ global min u: -24.8811212717138 k=32, 55.1385052448688 lat, -67.9199921453411 lon
+ global max u: 24.8826439680882 k=32, 55.1385052448688 lat, -65.9859011384469 lon
+ global max wsp: 24.9005471532306 k=32, -56.8317091401150 lat, -73.3539540452198 lon
Dynamics timestep beginning at 0001-01-06_00:20:00
split dynamics-transport integration 3
- global min w: -0.827560603561819E-01 k=10, 0.348852685590768 lat, -152.471108786604 lon
- global max w: 0.416842429643682 k=10, -10.7269418650993 lat, -124.098325389453 lon
- global min u: -24.8329565396156 k=32, 55.1385052448688 lat, -67.9199921453411 lon
- global max u: 24.8305536060895 k=32, 54.2000304227199 lat, -66.9529466418940 lon
- global max wsp: 24.9044677925270 k=32, -57.4393227686793 lat, -79.0639648705938 lon
+ global min w: -0.852569996992047E-01 k=10, -0.525698102631314 lat, -151.929884596772 lon
+ global max w: 0.303983176629856 k=8, -0.178735631872707 lat, -87.1929554924195 lon
+ global min u: -24.8593652223789 k=32, 55.1385052448688 lat, -67.9199921453411 lon
+ global max u: 24.8574043529990 k=32, 54.2000304227199 lat, -66.9529466418940 lon
+ global max wsp: 24.9411912575793 k=32, -57.4393227686793 lat, -79.0639648705938 lon
Test results comparing 5 day CHAOS2000dev and QPC6 GPU runs before and after these changes are in sub-dirs of "/glade/derecho/scratch/gdicker/ew-oacc_2024Nov07191000"
These runs are comparing ewm-2.3.008 and ewm-2.3.008 plus these changes
"*-ref-*"
subdirs are the "before" changes -> ewm-2.3.008"*-fix-*"
subdirs are the "after" changes -> ewm-2.3.008 plus this PR.NOTE: this PR is currently based on ewm-2.3.006 and doesn't include MPAS-O GPU changes. I will try to test this soon. Comments about CHAOS2000dev results are based on testing ewm-2.3.006 with and without these changes.
Results from this comment above based on ewm-2.3.006 are in subdirs of "/glade/derecho/scratch/gdicker/ew-oacc_2024Nov06124000/"
This PR brings switches the mpas external in CAM to use an updated OpenACC version that works for simple physics compsets. This re-enables MPAS-A OpenACC that was removed in #57.
More complicated test cases on GPUs like F2000climo and CHAOS2000dev now fail to complete with these changes. Answers beginning diverging from previous results and the runs either fail to finish within allotted time or crash due to "NaN detected in the 'w' field". These compsets successfully complete on GPUs if the CPU-only ew-develop-noacc MPAS-A branch is used instead.
CPU results are unaffected in CAM-MPAS with these changes.