EAP efficiency - Githubissues

eclare108213 commented 5 years ago

EAP is running significantly slower than EVP in v6, even after efficiency improvements in #257 (originally #255). In v5.1.2, EAP TimeLoop was slower than EVP by roughly a factor of 2, now it is a factor of 3. The dynamics itself is showing nearly a factor of 10. This appears to be due to boundary updates, but it looks like both codes have the same calls. Based on the timers unrelated to EAP, the variation due to machine load is around +/-5%. These are 5-year gx1 QC configurations with the halo masks turned on.

EVP conrad_intel_smoke_gx1_64x1_medium_qc.qc_evp

Timer 1: Total 5838.90 seconds Timer 2: TimeLoop 5837.01 seconds Timer 3: Dynamics 1501.73 seconds Timer 4: Advection 1205.95 seconds Timer 5: Column 2458.13 seconds Timer 6: Thermo 2133.05 seconds Timer 7: Shortwave 590.82 seconds Timer 8: Ridging 325.01 seconds Timer 9: Coupling 867.73 seconds Timer 10: ReadWrite 647.31 seconds Timer 11: Diags 744.80 seconds Timer 12: History 467.37 seconds Timer 13: Bound 3368.04 seconds Timer 14: BGC 0.00 seconds

EAP conrad_intel_smoke_gx1_64x1_medium_qc.qc_eap

Timer 1: Total 15796.37 seconds Timer 2: TimeLoop 15788.08 seconds Timer 3: Dynamics 11479.76 seconds Timer 4: Advection 1171.17 seconds Timer 5: Column 2402.69 seconds Timer 6: Thermo 2136.43 seconds Timer 7: Shortwave 583.50 seconds Timer 8: Ridging 274.81 seconds Timer 9: Coupling 913.23 seconds Timer 10: ReadWrite 708.97 seconds Timer 11: Diags 746.42 seconds Timer 12: History 510.62 seconds Timer 13: Bound 12242.84 seconds Timer 14: BGC 0.00 seconds

TillRasmussen commented 5 years ago

We can probably do similar improvements that was done with the evp scheme. Currently this is not top priority at DMI but the task should not be too big.

apcraig commented 5 years ago

I have looked at RASM, comparing the current "colpkg" version we're running in production and the latest CICE6 version and see a significant slowdown there as well.

PFS.a93_a93_rx1.GNC2all 6.7 s/day => 10.7 s/day PFS.a93_a93_rx1.GNC2skl 11.8 s/day => 16.2 s/day PFS.w5a_a94.RBR 14.7 s/day => 24.4 s/day

Ice timer output for PFS.a93_a93_rx1.GNC2all is as follows for the colpkg and cice6 versions,

colpkg: Timer 1: Total 46.10 seconds Timer 2: TimeLoop 31.17 seconds Timer 3: Dynamics 15.70 seconds Timer 4: Advection 4.54 seconds Timer 5: dEdd 0.00 seconds Timer 6: Column 34.17 seconds Timer 7: Thermo 9.91 seconds Timer 8: Shortwave 62.39 seconds Timer 9: Meltponds 0.00 seconds Timer 10: Ridging 57.49 seconds Timer 11: Cat Conv 0.00 seconds Timer 12: Coupling 29.01 seconds Timer 13: ReadWrite 0.00 seconds Timer 14: Diags 1.21 seconds Timer 15: History 0.85 seconds Timer 16: Bound 27.78 seconds

cice6: Timer 1: Total 74.40 seconds Timer 2: TimeLoop 51.26 seconds Timer 3: Dynamics 27.34 seconds Timer 4: Advection 12.09 seconds Timer 5: Column 11.49 seconds Timer 6: Thermo 10.86 seconds Timer 7: Shortwave 5.54 seconds Timer 8: Ridging 0.78 seconds Timer 9: Coupling 0.10 seconds Timer 10: ReadWrite 0.00 seconds Timer 11: Diags 1.80 seconds Timer 12: History 0.81 seconds Timer 13: Bound 47.23 seconds

The timer output in the two cases might not be apples and apples and there are obviously issues with internal load imbalance with regard to understanding where time is really spent, but it's the data we have at this point.

It is odd that the standalone model did not show a lot of performance degradation as we updated the implementation, but on aggregate, maybe there is.

eclare108213 commented 5 years ago

I don't think the timers in the colpkg run here are correct. For instance, Timer 6 (Column) should include all the times in Timers 7-11 (with overlap), and yet shortwave and ridging are each larger than the total column time. I think this problem was fixed in #98, so we'd need to compare timings after that or else go all the way back to v5.1.2. It would be good to get a handle on how the timings have changed since v5, and why.

TillRasmussen commented 3 years ago

Is there a reason why stepa is only called on every 10th iteration? Is it too expensive computational or is there a physical reasoning?

eclare108213 commented 3 years ago

I don't know. I'll ping Danny F's group and see if he/they can track it down.

hheorton commented 2 years ago

Hi everyone, I worked on updating the EAP rheology, originally developed/implemented by @mimi1981

I believe that the decision to update the structure tensor for only once in every 10 dynamics iterations was made by David and Michel.

I believe this decision was made for two reasons.

1 - The structure tensor will change at a much slower rate than the dynamics terms within the elastic solver due to the constraints set on the rate of change of structure tensor components (hard set parameters). Therefore the changes to the structure tensor need not be solved for at such a high time resolution. 2- They believed that having the structure tensor held constant for every 10 iterations will allow easier convergence for the rest of the ice dynamics.

I never investigated this part of the solver. However, I did read all the code many times, and the dynamics part of the code is much more computationally heavy than the structure tensor, so solving both at every time step won't add relatively that much computational time. Alternatively, the tensor solution time may be able to be reduced even further (every 20 dynamical steps for example).

It is of course possible that this difference in time steps may add strange numerical oscillations to the solutions, though I never noticed anything like that.

TillRasmussen commented 2 years ago

Within the update_stress_rdg there seem to be many if statements. Some of these can be avoided by removing dead code. Will we ever use interpolate_stress_rdg? x, y, Tany_1 and Tany_2 are not used unless it is . This removes some if statements.

hheorton commented 2 years ago

Can you direct me to the code and branch you're reading?

TillRasmussen commented 1 year ago

@hheorton Sorry this slipped my attention. In the main branch of the current version of CICE CICE/cicecore/cicedyn/dynamics/ice_dyn_eap.F90 subroutine update_stress_rdg line ~1706 if (interpolate_stress_rdg) then interpolate_stress_rdg is hard coded to false.

hheorton commented 1 year ago

@TillRasmussen , thank you for the update. DO you need me to check anything here? Or are you happy to go ahead and undo the hard coding and allow the interpolate option?

TillRasmussen commented 1 year ago

@hheorton I think that my main question is whether the interpolation is ever of need. If not the part of the code should be removed. I dont mind adding the namelist parameter.

hheorton commented 1 year ago

@TillRasmussen, well I always used it as it is much more accurate, particularly for high stress situations. Though I'm probably biased as I wrote the code! I'm not sure who else is actively using the EAP rheology at the moment so can't comment on current needs. It might be of particular use when trouble shooting model implementation as by interpolating and thus getting smoother solutions to the stress tensor one removes one possible source of numerical error.

TillRasmussen commented 1 year ago

The EAP is being used here and there. I will make a namelist parameter. One motive for me is that I and a colleague is doing some refactorization of the EVP and I would like to do the same for the EAP. This includes removal of if statements. I may be back with more questions about other if statements

hheorton commented 1 year ago

Happy to have a look at the if statements. From my memory many of them are there to avoid 1/0 errors and atan2(0,0) situations in order to make the code work for various different compilers. These may be better achieved using different methods.

CICE-Consortium / CICE

EAP efficiency #265