CICE-Consortium / CICE

Development repository for the CICE sea-ice model
Other
57 stars 131 forks source link

shortwave in restart files #59

Closed eclare108213 closed 5 years ago

eclare108213 commented 6 years ago

Store shortwave radiation for all blocks in the restart files, not just the active blocks, if possible. This should not change the answers when restarting from identical configurations, but could fix inexact restart problems when restarting with different block configurations. (Issue noted by DMI collaborators)

eclare108213 commented 6 years ago

@apcraig Was this issue fixed in PR #16 ?

apcraig commented 6 years ago

I don't believe so, but I could be wrong. PR #16 addressed the values of tracers in eliminated blocks and just sets values to zero, which is easy and reasonable. I believe the shortwave radiation is a different issue. What does "shortwave radiation" mean in this case, is that a forcing field or something related to an ice albedo computation. If we need to "store" the shortwave radiation for blocks where it's not being computed, that is going to be difficult.

eclare108213 commented 6 years ago

Here is a screen shot of the figure from Jacob's paper, showing the problem. swidr is derived from the input shortwave forcing field(s).

SW.pdf

apcraig commented 6 years ago

So, this is a problem. There is no memory and no decomposition associated with blocks that are eliminated. In v4 in the plot, land blocks are eliminated and that's why there is no data over those blocks. Land block elimination should not create any inexact restart problems. If it is, we need to look into that further.

There have always been problems with ice history files with different decompositions because of this issue. For instance, the lon/lat/mask output field is incomplete due to land block elimination even though the science is bit for bit. Any field that is sent to CICE and save to the history file will be incomplete if any blocks are eliminated. At the present time, there is now way to work around this problem. We could implement decompositions that allocate every block to a pe and then limit computation in smart ways to blocks with only active ocean. In theory, this would just add a bit of memory and have no impact on computation cost or decomposition and that would allow all fields to be complete on history files.

dabail10 commented 6 years ago

The radiation code requires the incoming shortwave components for a restart. This is because we scale the absorbed shortwave from the last step by the incoming for the current step. Why do we need these values over land though? If it is an eliminated land block, then there are never computations in these blocks. Even if we change the decomposition, the values are still not used over land. I don't understand how this causes inexact restart.

apcraig commented 6 years ago

I am on the same page as @dabail10. I don't think this is an inexact restart issue. If it is, we need to understand that better. What it does do is make history files not bit-for-bit over land, even though the results are actually bit-for-bit. So if we are trying to validate bit-for-bit, some fields in the history file cannot be compared "blindly" with different decompositions. This includes fields that are based on input or coupling and fields that are associated with the grid (lon, lat, mask, etc). This has been known for a while and makes life a bit tricky but does not affect science or restart.

As I suggested, I believe the only way to fix this is to modify the land block elimination so no blocks are actually eliminated from the decomposition, they are just removed from the computation. It should not be too hard to do this in practice if that's what we want.

eclare108213 commented 6 years ago

All of this is true, it's not the source of the restart issue and I don't think we want to stop eliminating blocks. There's something different about the how the shortwave is treated, in particular. Here's the text that goes with the figure, with my apologies for not being clear about the issue.

The CICE code supports a clever and very general blocking scheme which can be used to improve cache utilization in single process context and which serve as the general foundation for the parallelism in multiple process context. Now, it caught us by surprise that we did not manage to get binary identical results across single block and multiple block runs. However, diving deeper we found that the restart file contains one field, namely the shortwave radiation, for which only the content in the set of active blocks is stored in the restart file. Figure 3 shows the difference between a single block run and a multiple block run. It puzzles us that the storage format is not consistent across fields and self- contained so that a restart file written using one blocking can be read by another run using another blocking. Moreover, it puzzles us that a non-prognostic field like shortwave radiation is stored in the restart file in the first place. Our work-around was simply to exclude the shortwave radiation field from the restart file whereby we obtained binary identical results across different block settings.

I have explained to them the reason for shortwave being in the restart file. This problem could be related to a different kind of bug, like errors in the padding or haloes that I'm not sure they have fixed in their version of the code, but again, I don't know what would make the shortwave special as opposed to all the other fields in the restart files.

apcraig commented 6 years ago

My guess is that this is the only field on the restart file that has non zero values where there is land because those values are generated outside CICE. I assume that's what makes the field unique on the file. The values over land play do not role in the results. So while this field is technically not identical on restart files with different decompositions and different land block elimination, that difference does not matter from a science perspective. I am pretty sure this is not related to padding or haloes, just the fact that this field is computed outside CICE and is copied onto all non-eliminated blocks without regard to the actual mask. Another way to fix this is to just copy the field onto gridcells where the mask is ocean. That should make the field identical regardless of the decomposition or land block elimination.

eclare108213 commented 6 years ago

That would be an easy fix! Or just multiply it by hm.

TillRasmussen commented 5 years ago

I went back and checked the history. I agree with most that is being said in this thread. The issue appears when changing decomposition and is only present for shortwave related fields as all other fields are zero when land is present (based on a restart of CICEv6). The reason (as mentioned) for this is that the blocks do not cover the same area and the result is that land points will be calculated in some decompositions and not others.

As Tony hints the report has been written with a pure technical focus and this will change the restart file. The influence of this will in almost all (if not all) cases be zero.

The fix should be to only calculate these fields when ocean is present.

eclare108213 commented 5 years ago

Which configuration are you testing? In particular, I'm wondering what variety of forcing you are using. This should not be a problem for gx1 using the LYq atmo forcing option because fsw is multiplied by the land mask after it is calculated. On the gx3 grid, it is not masked. We could do that, but I wonder whether it's really necessary. Can we leave this as it is and close this issue, or should we multiply all of the forcing fields by the land mask? The latter would be more consistent across runs and perhaps less confusing to someone analyzing those fields, but my understanding is this is only in the restarts, not the history output. Suggestions welcome.

TillRasmussen commented 5 years ago

We read in data using the hycom_data function on a ~10km grid. This do not really influence the results, thus it is not that important to fix. Therefore yes I think that the issue can be closed, however it might be nice to change at some point.

eclare108213 commented 5 years ago

I will close the issue. We can reopen it if it shows up again in restart or other tests. If that happens, we will land-mask all of the forcing variables and see whether that takes care of the problem. Thank you @TillRasmussen.