CICE-Consortium / Icepack

Development repository for sea-ice column physics
Other
25 stars 131 forks source link

FSD bug fixes #424

Closed dabail10 closed 1 year ago

dabail10 commented 1 year ago

For detailed information about submitting Pull Requests (PRs) to the CICE-Consortium, please refer to: https://github.com/CICE-Consortium/About-Us/wiki/Resource-Index#information-for-developers

PR checklist

eclare108213 commented 1 year ago

@cmbitz @lettie-roach This is great. Before I approve the PR, I'd like to see some sort of documentation of exactly what this does, both in terms of the code/physics and in terms of the simulation. For the latter, do you have some before-and-after model output that could be posted here? In particular, do you have a sense of how big the differences are, e.g. are they "climate changing"? Maybe we should run a QC test in CICE.

Regarding the nature of the code/physics changes, please, would one of you put a comment here in this PR (or maybe in the template at the top) that describes what you did? It looks like an important piece is using the lateral melting rate directly rather than fside, which is somehow incorrect (how?). Are there other changes beyond that, which contribute to the non-bit-for-bit-ness? E.g. I'm not sure what's happening with the welding. No need for a book, just a few sentences to clarify what the problem is that you're fixing. Thank you!

cmbitz commented 1 year ago

I hope I've attached a figure showing the consequences to the IcePack ice thickness distribution (aicen) for a 5 category simulation, SST, and fhocn. These simulations have a slab ocean. I can't upload netcdf files (85MB) here but am happy to provide them, or can provide a log file if you like.

I will attempt to test in CICE next. It will take me a few days though.

FSD_bugfixes_vs_original

cmbitz commented 1 year ago

The changes only affect the FSD (when tr_fsd = .true.). The gist of it is that the original code attempted to estimate the fside (the lateral heat flux) in subroutine frzmlt_bottom_lateral, just like is done for the non FSD code. However, there was an error, where fside was computed as a function of the enthalpy sum across all layers and categories rather than the mean. The variable fside computed in frzmlt_bottom_lateral is unique for each thickness category, but in fact fside should be unique for each floe bin too. So even if the bug were fixed, it would be an imprecise solution. The variables fside eventually makes its way to subroutine lateral_melt where fside is used to reduce the sea ice concentration.

Because frzmlt_bottom_lateral doesn't know about the FSD, it made more sense to me to redesign this code to ONLY compute wlat (lateral melt rate) and send wlat to lateral_melt, where we can compute rside and fside uniquely for each FSD bin and thickness category in the FSD.

There is also a very minor bug fix to the welding scheme, where a factor of 1/2 was present for a very minor part of the calculation, where it should have been a 1. Finally there was a line out of order a bit later in the weld scheme that effectively left out the intended "FSD clean-up". These two tiny modifications had a pretty minimal impact on Icepack. We recommend them though since the code now matches the equations, and it naturally has total loss = total gain from the numerics, while previously this constraint had to be imposed.

eclare108213 commented 1 year ago

Thank you for the explanation - this makes a lot of sense. If you have trouble setting up and running the QC test, let us know. It's not obvious to me whether this is a climate-changing alteration, so I'm curious to see the QC outcome (comparing before and after the code mods, both cases with FSD turned on). The SST difference looks big!

dabail10 commented 1 year ago

I have added the icepack warnings calls to replace the print statements. Let me know if you see problems.

lettie-roach commented 1 year ago

Does this mean that fside no longer needs to be sent out from frzmlt_bottom_lateral (maybe for non-FSD runs?) and/or from lateral_melt (maybe for history)?

I think that's right, we could remove it, unless we want to add it to the history file? It's not currently in the history file, but could be useful to have in there

apcraig commented 1 year ago

Are we still waiting for a QC test? Could I help with that?

dabail10 commented 1 year ago

CC is still working on the CICE changes and then I am happy to run the QC test.

dabail10 commented 1 year ago

Why is this a conflict?

apcraig commented 1 year ago

Your recent changes didn't create the conflict, they were there before. Must be associated with a recent PR to main.

dabail10 commented 1 year ago

Here are the plots and results from the QC test:

(npl) [dbailey@cheyenne3 CICE_CC]> ./cice.t-test.py /glade/scratch/dbailey/CICE_RUNS/cheyenne_intel_smoke_gx3_40x1_fsd12ww3_medium_qc.fsdmods /glade/scratch/dbailey/CICE_RUNS/cheyenne_intel_smoke_gx3_40x1_fsd12ww3_medium_qc.fsdbase INFO:main:Running QC test on the following directories: INFO:main: /glade/scratch/dbailey/CICE_RUNS/cheyenne_intel_smoke_gx3_40x1_fsd12ww3_medium_qc.fsdmods INFO:main: /glade/scratch/dbailey/CICE_RUNS/cheyenne_intel_smoke_gx3_40x1_fsd12ww3_medium_qc.fsdbase INFO:main:Number of files: 1825 INFO:main:2 Stage Test Passed INFO:main:Quadratic Skill Test Failed for Northern Hemisphere INFO:main:Quadratic Skill Test Passed for Southern Hemisphere QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/glade/scratch/dbailey/runtime-dbailey' INFO:main:Creating map of the data (ice_thickness_cheyenne_intel_smoke_gx3_40x1_fsd12ww3_medium_qc.fsdmod.png) INFO:main:Creating map of the data (ice_thickness_cheyenne_intel_smoke_gx3_40x1_fsd12ww3_medium_qc.fsdbase.png) INFO:main:Creating map of the data (ice_thickness_cheyenne_intel_smoke_gx3_40x1_fsd12ww3_medium_qc.fsdmod_minus_cheyenne_intel_smoke_gx3_40x1_fsd12ww3_medium_qc.fsdbase.png) INFO:main: ERROR:main:Quality Control Test FAILED

ice_thickness_cheyenne_intel_smoke_gx3_40x1_fsd12ww3_medium_qc fsdbase ice_thickness_cheyenne_intel_smoke_gx3_40x1_fsd12ww3_medium_qc fsdmod ice_thickness_cheyenne_intel_smoke_gx3_40x1_fsd12ww3_medium_qc fsdmod_minus_cheyenne_intel_smoke_gx3_40x1_fsd12ww3_medium_qc fsdbase

dabail10 commented 1 year ago

Not sure why that was a conflict, but I have fixed it now.

dabail10 commented 1 year ago

There was some spacing changes in main I guess. I think I have resolved this?

eclare108213 commented 1 year ago

I think the plan here is to implement these changes (or some of them) so that they are backward compatible, since the new approach isn't completely nailed down yet. Does that make sense in this case, or is this PR a bug that needs to be fixed regardless?

dabail10 commented 1 year ago

I just updated this so that if tr_fsd is .false. then it is backwardly compatible.

apcraig commented 1 year ago

@dabail10, I think more needs to be done. For instance, sss and wlat both need to become optional. And all the old code associated with sss needs to come back. And there needs to be a check if sss or wlat is passed (only one should be), etc. That should trigger the calculation for now. In general, that is NOT how we want this to work. If we want to support both sss and wlat longer term, we need a flag to choose the option. But for now, I think we can allow a check of arguments to see which calculation is done.

eclare108213 commented 1 year ago

Okay, I see there's some confusion here about what needs to be backwards compatible. If tr_fsd is false, then definitely it needs to be backwards compatible for that case, but if tr_fsd is true, should these changes be considered a bug fix or should we keep the previous approach around for the time being, until the new one is confirmed as working? The confusion may be all mine -- I'd like to better understand what is changing and why. I think CC said there were some changes that are BFB, others not, and bug fixes in the mix. Not sure how all that maps out in Icepack and CICE.

dabail10 commented 1 year ago

I am definitely confused as well! I can certainly make it completely backwards compatible, but I was thinking that it was going to be answer changing when tr_fsd = .true.

apcraig commented 1 year ago

I think the only answer changes are when tr_fsd=T, correct?

I thought the concern was that this changed the public icepack interfaces (sss->wlat). But upon closer inspection, that doesn't seem to be the case. sss->wlat is changed in lateral_melt (internal subroutine). And wlat is ADDED to icepack_step_therm[1,2] (public subroutines) and frzmlt_bottom_lateral (internal subroutine). So I believe this is fully backwards compatible except when fsd is turned on.

This following block checks the fsd changes for the new wlat argument in icepack_step_therm[1,2], so that's great. And it's in the icepack_chkoptargflag if-block so that's perfect.

          if (tr_fsd) then
             if (.not.(present(wlat))) then
                call icepack_warnings_add(subname//' error in FSD arguments, tr_fsd=T')
                call icepack_warnings_setabort(.true.,__FILE__,__LINE__)
                return
             endif
          endif

I think this should be backwards compatible except for fsd runs. But we need to change CICE to be able to test fsd there. But users in general don't need to change their code unless they want to run with fsd.

I guess the slightly bigger question is whether fsd will need to change again. We do have other options including

The CICE update, https://github.com/CICE-Consortium/CICE/pull/813, adds wlat to the icepack_step_therm[1,2] call. And that looks fine. It's intent(out) in therm1 and intent(in) in therm2. wlat is unset when passed into therm1 the first time, I wonder if we should set wlat(:)=c0 in CICE at initialization? I'll add that comment in https://github.com/CICE-Consortium/CICE/pull/813.

I think what we have here is quite reasonable and that we should merge this PR. Was testing done on the most recent version?

apcraig commented 1 year ago

I ran a full test suite of this version of Icepack and https://github.com/CICE-Consortium/CICE/pull/813 plus this version of Icepack. All tests pass. The fsd results change answers in both Icepack and CICE, as expected. However, the snwgrain tests also changed answer in CICE. You can actually see this in the original tests done here, see "cheyenne intel restart gx3 8x2 icdefault snwgrain snwitdrdg" in https://github.com/CICE-Consortium/Test-Results/wiki/8d8a2d7db8.cheyenne.intel.23-02-15.162343.0.

I think we need to understand why snwgrain is changing answers in CICE with these mods and document that as well.

apcraig commented 1 year ago

Here are the full test results,

https://github.com/CICE-Consortium/Test-Results/wiki/icepack_by_hash_forks#e1af2cd1f83aa6ff2cfb3587962600d928d59768

https://github.com/CICE-Consortium/Test-Results/wiki/cice_by_hash_forks#0dab0da7353aa25d3fe4b882f94f07408d83b2db

You can ignore the gx1prod tests, those were still pending, but are not important at this point.

dabail10 commented 1 year ago

wlat is initialized to c0 in frzmlt_bottom_lateral where it is intent(out). I think this covers it. I will let @cmbitz chime in terms of future potential changes here.

dabail10 commented 1 year ago

I'm not sure I understand this. Are the answers different with tr_fsd = .true. and snwgrain = .true.? Or just when snwgrain = .true.? Is it possible that I reverted some of the snwgrain changes from the recent commit?

apcraig commented 1 year ago

@dabail10, The snwgrain results change answers without fsd. I also was wondering if snwgrain changes were reverted but I don't think they were. That's the first thing I checked, but maybe we should look closer. I think we need to understand this before we merge.

dabail10 commented 1 year ago

There was a commit with the change in the optional arguments with rsnow. Did this change answers? If so, this is not in the FSDmods version.

dabail10 commented 1 year ago

Maybe I should fast-forward the FSDmods branch?

apcraig commented 1 year ago

I don't think it's the most recent merges either. I tested before that and the original test results from 2 months ago also show the same. If you look at the failures in the original test suite, it shows the snwgrain changes answers.

apcraig commented 1 year ago

Just to clarify, not 2 months ago, 2 weeks ago, these test results, https://github.com/CICE-Consortium/Test-Results/wiki/8d8a2d7db8.cheyenne.intel.23-02-15.162343.0.

apcraig commented 1 year ago

I think I understand why the snow results change answers, the Consortium CICE version we're comparing to is a bit behind with respect to Icepack. I will put in a PR to update Icepack in CICE then restest these changes again against the latest CICE main.

apcraig commented 1 year ago

I ran a full test suite of these changes in CICE against the current CICE version with Icepack fully updated and everything is bit-for-bit except the fsd changes. So now we are getting the results we expected.

7641 measured results of 7641 total results
7530 of 7641 tests PASSED
9 of 7641 tests PENDING
0 of 7641 tests MISSING data
102 of 7641 tests FAILED

I think everything is fine and this is ready to merge. Unless anyone disagrees, I'll merge this today or tomorrow. Then we'll want to update the CICE fsd PR to include the latest Icepack and merge that PR as well. @eclare108213 @dabail10 @cmbitz, just speak up if we should delay further. Thanks.

dabail10 commented 1 year ago

This is great news. I would like to see this go in asap.

Dave