CICE-Consortium / Icepack

Development repository for sea-ice column physics
Other
25 stars 131 forks source link

Bug in temperature_change_mushy #233

Closed njeffery closed 3 years ago

njeffery commented 5 years ago

@eclare108213 This is a bugfix for a very infrequent failure in the mushy layer thermodynamics. The mushy temperature change iteration is currently set to a tolerance of ferrmax which is the same as the final energy conservation tolerance. Every so often, when the temperature change iteration error is very close to ferrmax, the total energy conservation will fail because of round-off level errors in the thickness routine. We corrected this in mpas_seaice by lowering the tolerance for the iteration to 0.9*ferrmax. See below. Lines with + indicated added changes and - indicates deletions. This is a non BFB change.

We also corrected the diagnostics in the conservation check to properly indicate in this special case if the energy error is due to the temperature change or the thickness change. This is BFB.

8 src/core_seaice/column/ice_therm_mushy.F90 @@ -523,7 +523,7 @@ subroutine two_stage_solver_snow(nilyr, nslyr, & ! check if solution is consistent ! surface conductive heat flux should be less than ! incoming surface heat flux

dabail10 commented 5 years ago

What does the code abort with (before your fix)? Is this a flux convergence error? How often did you see this error and does this fix eliminate all of the convergence errors? I'm assuming this is also present in CICE5.

njeffery commented 5 years ago

Dave,

This an extremely infrequent error (we saw this once in a fully coupled run after 500 years) that aborts as a column_vertical_thermodynamics conservation error. We've been getting past these infrequent failures by flipping the BFB PE flag which, I'm told, changes the order of operations in the coupler. Even small changes in forcing will alter the temperature change iteration and avoid the failure. I assume it's in cice5 too.

Some more explanation:

The mushy temperature change routine currently iterates until fcondtopn-fsurfn < ferrmax though there is an intermediate check in the mushy routine that places an energy error limit of 0.9*ferrmax. There's an inconsistency in the mushy energy conservation check that misses this one particular case. If fsurfn > fcondtopn, the energy difference is used to melt the ice top in the thickness routine and all works as it should in both the final conservation test and the intermediate mushy conservation test. If, however, fcondtopn > fsurfn by ferrmax-puny, then the energy (fcondtopn-fsurfn) isn't used in the thickness change routine and should be accounted for in the mushy conservation test. It then appears as a large imbalance in the final conservation check that can be pushed over the limit by round off errors in the thickness change routine. I didn't change the intermediate mushy conservation test to include the fcondtopn-fsurfn, though this could be done for consistency. Instead I changed the diagnostics in the final thermodynamics check to better reflect which routine is contributing to the errors.

nicole


From: dabail10 notifications@github.com Sent: Wednesday, November 28, 2018 1:05:46 PM To: CICE-Consortium/Icepack Cc: Jeffery, Nicole; Author Subject: Re: [CICE-Consortium/Icepack] Bug in temperature_change_mushy (#233)

What does the code abort with (before your fix)? Is this a flux convergence error? How often did you see this error and does this fix eliminate all of the convergence errors? I'm assuming this is also present in CICE5.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/CICE-Consortium/Icepack/issues/233#issuecomment-442586216, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AH1K6nXyxqf9wtO4DVxdXdFPZ2sqO0CJks5uzuyagaJpZM4Y4MJf.

eclare108213 commented 5 years ago

My initial inclination is to postpone fixing this until after the release, because it seems to be extremely rare and the change is not BFB. MPAS-seaice is running with a VERY old version of the column package (pre Icepack), which @proteanplanet plans to upgrade soon. I was thinking that he could test this change during that process; there will probably be other changes needed to make Icepack compatible with MPAS-seaice. However, if this is the source of CESM headaches, then maybe we should get it in sooner. @dabail10 is there a way to quickly determine whether this helps?

duvivier commented 5 years ago

@eclare108213 We are also running with an older version of the code in CESM, so fixing this bug for the Icepack release will not affect our runs. But it's good to know about so we can fix it in our older CICE code used in CESM2. So I'm okay with having @proteanplanet do the tests during the MPAS updating process.

apcraig commented 5 years ago

@eclare108213 I agree that it makes sense to defer until after the release and further testing.

dabail10 commented 5 years ago

I am fine with deferring this. I would like to do a longer CESM run with this fix to assess if it changes climate. Marika and I discussed it and don't believe it would be climate changing, but it is good to check none-the-less.

njeffery commented 5 years ago

I also suspect that it is not climate changing. Another possibility, though, is that with the stricter tolerance, we start to see new convergence failures in the temperature change routine. We're starting our E3SM coupled bgc simulations with this fix. I'll let everyone know if we run into problems.


From: dabail10 notifications@github.com Sent: Wednesday, November 28, 2018 6:25:51 PM To: CICE-Consortium/Icepack Cc: Jeffery, Nicole; Author Subject: Re: [CICE-Consortium/Icepack] Bug in temperature_change_mushy (#233)

I am fine with deferring this. I would like to do a longer CESM run with this fix to assess if it changes climate. Marika and I discussed it and don't believe it would be climate changing, but it is good to check none-the-less.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/CICE-Consortium/Icepack/issues/233#issuecomment-442670324, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AH1K6uVksY5kY_SUnGpvFEjwUfrhjmLkks5uzzefgaJpZM4Y4MJf.

njeffery commented 5 years ago

Another solution to the problem that is BFB (except when the code fails) is to leave the convergence test tolerance in temperature_change_mushy as is (ferrmax) and increase the final check by 1.1*ferrmax.


From: Jeffery, Nicole Sent: Thursday, November 29, 2018 10:22:16 AM To: CICE-Consortium/Icepack; CICE-Consortium/Icepack Cc: Author Subject: Re: [CICE-Consortium/Icepack] Bug in temperature_change_mushy (#233)

I also suspect that it is not climate changing. Another possibility, though, is that with the stricter tolerance, we start to see new convergence failures in the temperature change routine. We're starting our E3SM coupled bgc simulations with this fix. I'll let everyone know if we run into problems.


From: dabail10 notifications@github.com Sent: Wednesday, November 28, 2018 6:25:51 PM To: CICE-Consortium/Icepack Cc: Jeffery, Nicole; Author Subject: Re: [CICE-Consortium/Icepack] Bug in temperature_change_mushy (#233)

I am fine with deferring this. I would like to do a longer CESM run with this fix to assess if it changes climate. Marika and I discussed it and don't believe it would be climate changing, but it is good to check none-the-less.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/CICE-Consortium/Icepack/issues/233#issuecomment-442670324, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AH1K6uVksY5kY_SUnGpvFEjwUfrhjmLkks5uzzefgaJpZM4Y4MJf.

dabail10 commented 5 years ago

Funny. Marika and I were just discussing exactly this change. I will be doing a CESM test with this after AGU.

proteanplanet commented 5 years ago

I'm coming in late on this because I've been away:

RASM has been having intermittent non-convergence errors using CICE6-alpha. Up to now, these had not been diagnosed, and had been suspected of being a separate issue with ocean coupling. But it is possible the problem in RASM and this issue are the same. The errors are much more frequent than one in 500 years, but that could well be a result of resolution - RASM tends to be good at detecting problems, perhaps due to its resolution. I appreciate there are two opinions on this - one to include the bug-fix in Icepack right now, the other to wait and make sure it is well tested. I suggest that the right path is to test this in both E3SM and RASM before releasing it as an update to Icepack in spring 2019.

I will take responsibility for making this happen.

eclare108213 commented 4 years ago

@proteanplanet @dabail10 I'm just pinging this issue again. It was last commented on in December 2018! I think we should go ahead and fix it, and certainly we will need to do so for the MPAS/Icepack updates that are planned. @dabail10 did you test this in CESM, either the 0.9 or 1.1 factor? @apcraig was there any RASM testing of it?

dabail10 commented 4 years ago

We have implemented the 1.1 factor in CESM/CICE5. I thought this was already fixed in CICE6?

apcraig commented 4 years ago

There was no testing in RASM as far as I know. If it's fixed in CICE6, then it's in RASM, but I don't think it's been fixed in CICE6. A grep of ferrmax in icepack_therm_mushy.F90 suggests it has not.

proteanplanet commented 4 years ago

The merge of Icepack with E3SM provides an opportunity to check for differences, because the issue does not occur in E3SM to tbe best of my knowledge.

eclare108213 commented 4 years ago

I think @njeffery found this and fixed it in E3SM to begin with, but it never made its way back to Icepack. So you're right, the issue does not occur in E3SM any more.

njeffery commented 4 years ago

@eclare108213 : That's correct.

eclare108213 commented 3 years ago

@dabail10 Your comment above says that you implemented the 1.1 factor in CESM/CICE5 and tested it. Was this one of the changes in your port of CESM mods to Icepack? If not, can we get it in there?

dabail10 commented 3 years ago

We do have this in CESM/CICE5. I think the alternate solution in CICE6 was used where the convergence is restricted more in the iteration:

ferr = (efinal - einit) / dt - (fcondtop - fcondbot + fswint - fadvheat)

lconverged = (dTsf  < dTemp_errmax .and. &
              dzTsn < dTemp_errmax .and. &
              dzTin < dTemp_errmax .and. &
              abs(ferr) < 0.9_dbl_kind*ferrmax)

So, the factor of 0.9 here.

eclare108213 commented 3 years ago

@njeffery The 0.9*ferrmax factor for lconverged (in the comment immediately above) is in Icepack now. Is this sufficient to fix the original error, or do we need the factor somewhere else too? It looks like you might have added it in a lot of places.

I prefer the fix using the 1.1 factor -- @dabail10 where in the (CESM) code did you put that? Thx, e

dabail10 commented 3 years ago

This fix goes in the final convergence check in icepack_therm_vertical where ferrmax is used.

njeffery commented 3 years ago

In the MPAS-O version, I added the 0.9*ferrmax condition to the mushy two_stage_solver_snow and two_stage_solver_nosnow following the picard solve. This is done twice in each routine because of the if statement separating the two surface types.


From: Elizabeth Hunke @.***> Sent: Monday, March 15, 2021 10:47:45 AM To: CICE-Consortium/Icepack Cc: Jeffery, Nicole; Mention Subject: [EXTERNAL] Re: [CICE-Consortium/Icepack] Bug in temperature_change_mushy (#233)

@njefferyhttps://github.com/njeffery The 0.9*ferrmax factor for lconverged (in the comment immediately above) is in Icepack now. Is this sufficient to fix the original error, or do we need the factor somewhere else too? It looks like you might have added it in a lot of places.

I prefer the fix using the 1.1 factor -- @dabail10https://github.com/dabail10 where in the (CESM) code did you put that? Thx, e

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/CICE-Consortium/Icepack/issues/233#issuecomment-799572368, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB6UV2UBZEHKAAA4RX27RX3TDY23DANCNFSM4GHAYJPQ.