E3SM-Project / E3SM

Energy Exascale Earth System Model source code. NOTE: use "maint" branches for your work. Head of master is not validated.
https://docs.e3sm.org/E3SM
Other
352 stars 363 forks source link

Current EC30to60 ocean/ice initial condition should have bumped revision number #5516

Open vanroekel opened 1 year ago

vanroekel commented 1 year ago

The current ocean ice condition changed in #4091 had a change in bathymetry from ETOPO to GEBCO and should have not been placed in the EC30to60E2r2 directory of the input data folder. The revision number was bumped here https://github.com/MPAS-Dev/MPAS-Model/pull/669 but I did not propagate that to E3SM. Mixing bathymetry in the same directory leads to confusion and issues in analysis scripts.

The ocean and ice meshes date stamped 210210 should be moved to an 'r3' folder and compsets that used the EC30to60E2r2 mesh with that date should be updated to the r3 version after moving the input file.

vanroekel commented 1 year ago

@jonbob @xylar @milenaveneziani Here is the issue we discussed on slack. After thinking about this I'm thinking we should make the EC30to60E2r3 folder and make this as clean as possible but would appreciate your input.

@njeffery @maltrud Given the date stamp of your BGC file I wanted to make you aware of this as well as you likely have GEBCO bathymetry as well

@xylar I'm also thinking the attributes of the 210210 IC should be modified to specify the bathymetry used as done in COMPASS. Do you think I could possibly use the files_for_e3sm portion of compass to add the attributes to this? I'm guessing I could also just use ncatted as well potentially.

vanroekel commented 1 year ago

Also pinging @golaz for any input on the plan to move this specified initial condition to a new EC30to60E2r3 directory.

vanroekel commented 1 year ago

Also just confirming here that the v2 and v2.1 LR simulation campaigns used the mesh referenced here.

xylar commented 1 year ago

@xylar I'm also thinking the attributes of the 210210 IC should be modified to specify the bathymetry used as done in COMPASS. Do you think I could possibly use the files_for_e3sm portion of compass to add the attributes to this? I'm guessing I could also just use ncatted as well potentially.

I need to check if files_for_e3sm from compass can take care of this. It will be necessary to rerun files_for_e3sm regardless to produce the full set of diagnotics and analysis member files for the new mesh name.

milenaveneziani commented 1 year ago

Also just confirming here that the v2 and v2.1 LR simulation campaigns used the mesh referenced here.

Luke, just to clarify: the v2 and v2.1 LR simulation campaigns did not use the mesh and bathymetry references here, since the default initial condition file for EC30to60E2r2 has the 201001 time stamp on it.

xylar commented 1 year ago

@milenaveneziani and @vanroekel, I think the reason that the LR simulation campaigns did not use this mesh is because they are likely starting from the "spun up" initial condition, which was not updated in #4091.

milenaveneziani commented 1 year ago

yes, indeed: that was what I meant by default initial condition. @xylar: what is the rationale in using the r# in our mesh names, can you please remind me? it refers to any revision in mesh details, including bathymetry? In that case, then creating the EC30to60E2r3 folder seems like the perfect the solution here.

xylar commented 1 year ago

Yes, the revision r# is for any change in the mesh or initial condition. So we created different revisions for different bathymetry, different vertical grid, but also we would have had a different revision for the EN4 1900 initial condition (which we never actually used).

vanroekel commented 1 year ago

Now I'm confused about what used which file. The very first input to the v2/v2.1 sim campaign -- https://acme-climate.atlassian.net/wiki/spaces/EWCG/pages/2781544502/20210518.v2rc1a.Frazil3.ne30pg2+EC30to60E2r2.chrysalis -- should have used the EC30to60E2r2 non spun up, well spun up very briefly, and given it has a tag after #4091 I would have guessed it used the new bathymetry. But I can check the restart files to confirm.

milenaveneziani commented 1 year ago

For v2.1: v2_1.LR.piControl is a hybrid from 20221012.submeso.piControl.ne30pg2_EC30to60E2r2.chrysalis, which in turn restarted from your summer simulation 20220715.submeso.piControl.ne30pg2_EC30to60E2r2.chrysalis. And that last simulation started from EC30to60E2r2/mpaso.EC30to60E2r2.rstFromG-anvil.201001.nc. So v2.1 for sure used the old bathymetry.

vanroekel commented 1 year ago

You are right. I tracked down the wrong path. I also just confirmed the v2.1 and v2 bathymetry is consistent with the old files. That makes this change simpler (no need for PRs to maint2 and maint2.1). Thanks for the explanation!

rljacob commented 1 year ago

Should v3 sims use this new bathymetry?

xylar commented 1 year ago

@rljacob, yes, or actually I think we should generate a new set of meshes for E3SM v3 (which will have the new bathymetry). The E2 in the mesh name explicitly indicates that it's for E3SM v2.

vanroekel commented 1 year ago

I agree with Xylar completely. We definitely want the bathymetry to be updated but with the E3 tag indicating a new series of meshes.

xylar commented 1 year ago

@vanroekel, what is the situation with the update initial conditions for WC14 in https://github.com/E3SM-Project/E3SM/pull/4091? Do you think that could have the same issue? If so, we should make a separate E3SM issue for that as well.

xylar commented 1 year ago

To add to the detective work, it seems that @vanroekel updated the spun-up initial conditions to the new version of the mesh in #4091 but then the spun-up initial conditions (but not the ones without spin up) were set back to the earlier version in #4127.

vanroekel commented 1 year ago

@xylar thanks for the continued detective work on sorting this out. I've dug around a bit as well and here is my summary of what is updated and what is not

EC30to60E2r2 -- both of the ICG files in the input data server use the ETOPO bathymetry, 
mpaso.EC30to60E2r2-1900.rstFromG-anvil.200910.nc
 and 
mpaso.EC30to60E2r2-1900.rstFromG-anvil.201001.nc

have the same bathymetry

but the cold start ocean ICs 

ocean.EC30to60E2r2.210210.nc
and
ocean.EC30to60E2r2.200908.nc

have different bathymetry (but the same horizontal mesh)  This means EC30to60E2r2 G-cases run after #4091 have the different bathymetry 

For the WC14

The non spun up ICs
ocean.WC14to60E2r3.200714.nc
and
ocean.WC14to60E2r3.210210.nc

use different bathymetry

there are 3 spun up conditions
mpaso.WC14to60E2r3.rstFromG-anvil.201002.nc - pre #4091
mpaso.WC14to60E2r3.rstFromG-anvil.210210.nc -- post #4091
mpaso.WC14to60E2r3.rstFromG-anvil.210226.nc -- post #4127

for these the last two use different bathymetry than the first.  I'll check if the production WC14 fully coupled RRMs used post #4091, but will also make a separate issue and report that finding there.
vanroekel commented 1 year ago

Based on these findings I think for the EC30to60E2r2 we need to move the 210210 datestamp to an 'r3' folder and generate a new spun up condition and regenerate mask/transect files. Although I think the transect files should be okay?

milenaveneziani commented 1 year ago

If coastline hasn't changed, then the transect files are all good.

I have a question related to the Gebco bathymetry version: is there some metadata that indicates it? I wonder because we have updated the Gebco bathymetry last summer (the previous version was weird in the Chukchi Sea), but from the mentioned time stamps, it seems to me that the initial conditions that we are discussing here use the old(er) Gebco (pre-July 2022)?

vanroekel commented 1 year ago

Thanks for that clarification @milenaveneziani that makes sense.

Unfortunately no on the metadata. It appears I neglected to use the files_for_e3sm script in making the IC so the metadata is not there. I made the update to that file here https://github.com/MPAS-Dev/MPAS-Model/pull/669 but it did not get used for the E3SM ICs. But given the dates I would agree it is the older Gebco data.

milenaveneziani commented 1 year ago

The dates will be sufficient. Maybe we can add a note somewhere in all $meshversion/ directories that use this bathymetry, just to make sure we remember.

vanroekel commented 1 year ago

Pre move of the affected IC's I'm going to try run the files through the files_for_e3sm part of compass to get the right metadata in the files and we could note this bathymetry there.

xylar commented 1 year ago

Pre move of the affected IC's I'm going to try run the files through the files_for_e3sm part of compass to get the right metadata in the files and we could note this bathymetry there.

I have already done this but I can rerun (it takes about half an hour). I can add a note about the version of GEBCO and the Chukchi Sea.

xylar commented 1 year ago

I will go ahead and make the EC30to60E2r3 directory and populate it so that we can make the corresponding PR.

xylar commented 1 year ago

I believe I have all the ocean and sea ice files in place in the following locations: https://web.lcrc.anl.gov/public/e3sm/inputdata/ocn/mpas-o/EC30to60E2r3/ https://web.lcrc.anl.gov/public/e3sm/inputdata/ice/mpas-seaice/EC30to60E2r3/ https://web.lcrc.anl.gov/public/e3sm/diagnostics/maps/ https://web.lcrc.anl.gov/public/e3sm/diagnostics/mpas_analysis/maps/ https://web.lcrc.anl.gov/public/e3sm/diagnostics/mpas_analysis/region_masks/

I believe permissions are also correct. @vanroekel, would you be willing to make a PR to update E3SM itself accordingly? I believe we will also need @jonbob to copy some domain and mapping files.

Notes:

xylar commented 1 year ago

One more note: It seems that files_for_e3sm produces substantially fewer partition files than @jonbob has made in the past. I have asked on Slack for @jonbob to provide me with the logic he uses to determine this list of core counts so I can make files_for_e3sm match. I will rerun after that to get the missing partition files and add them.

The same for #5520.

xylar commented 1 year ago

Shoot, LCRC has maintenance today so we're going to have to put this work on hold until tomorrow.

rljacob commented 1 year ago

Been sort of following this. What is the purpose of the new EC30to60E2r3 directory?

xylar commented 1 year ago

@rljacob, some of the files included in the current EC30to60E2r2 directory are actually for a different bathymetry (therefore a different ocean mesh by the naming conventions that we adopted for E3SMv2 and beyond). @vanroekel accidentally gave them the same name as the existing EC30to60E2r2 but they should have been a new revision, EC30to60E2r3. We are working to correct this now.

rljacob commented 1 year ago

But the "E2" stands for E3SMv2 right? What v2 run would ever use this? 2.1 and 2.0 have to keep using what they've been using.

xylar commented 1 year ago

There will need to be PRs to patch those maintenance branches, I believe. It's massively confusing to have different runs that appear to have the same mesh but don't actually. So we'd definitely want to fix this for any new runs that folks start. There's very little we can do about runs already started but fortunately most of them seem to have started with an initial condition that was correctly named.

rljacob commented 1 year ago

Do the answers change if I try to rerun v2.0 and v2.1 with these new files?

xylar commented 1 year ago

No, they are just new names (and new metadata) for existing files.

rljacob commented 1 year ago

Ok good. For the files with misleading or incorrect metadata, it would be ok to use ncks to add a global attribute that explains things (without changing the filename). I like that better than a separate README file that can easily be ignored.

xylar commented 1 year ago

We will change the metadata but the filenames themselves are terribly misleading. So I'm not okay with keeping them as they are. They don't belong in the directory where they currently are.

rljacob commented 1 year ago

I would at least leave a softlink from the old filenames to the new one. Otherwise won't runs with the v2.0.0 and v2.1.0 tags break?

xylar commented 1 year ago

Isn't the point of the maintenance branches that folks should be running with those instead of the tags themselves? We could keep symlinks but that somewhat defeats the purpose of fixing the issue, since the problem arose because folks were trying to use one initial condition as the mesh file for analyzing simulation results that used the other initial condition. We might hope that folks would notice that the there's a symlink but it would be very easy to repeat the same mistake as long as the incorrect file is there.

The problem partly arises because the initial condition is not linked to the simulation itself. If it were, it would be very easy to find the right initial condition (and therefore mpas mesh) for a given simulation. As it is, we have to either find a restart file (sometimes easy, sometimes not) or we have to guess which initial condition was used. In the latter case, if we get it wrong and we have the wrong mesh, that's a big problem because the analysis can be badly wrong as a result.

vanroekel commented 1 year ago

I fully agree with @xylar that leaving the files or a putting in a link defeats the purpose of the fix. I believe both the fix to this issue and #5520 will require PRs to master, maint2.0 and maint2.1 where the latter will point the appropriate comp sets back to the files in the new EC30to60E2r3 directory. With those PRs I don't see the need to keep the soft link, new runs would point to the new directory AND be BFB with previous runs.

But perhaps I misunderstand though as the v2.0 and 2.1 tags would break, but I thought the practice was to run simulations of v2.0 and v2.1 from maint2.0 and 2.1 respectively. I did not know we guaranteed v2.0/2.1 tags will continue to work.

rljacob commented 1 year ago

Its true that recommended practice is to use maint branches for new 2.0 or 2.1 runs. And the tagged versions will all eventually break as their machine files go out-of-date. But I think we want to not purposefully break tagged versions this early after making them since we might need to run them (and versions on the maint branch prior to this change) in a bisect for example.

How about this, replace each bad file with "filename.README" and explain there why the file was removed and what could be used in its place (by renaming) for a run with an old version.

xylar commented 1 year ago

@rljacob, I think what you suggest is right. We could add an ocean.EC30to60E2r2.210210.nc.README in place of ocean.EC30to60E2r2.210210.nc that explains the problem (and hopefully someone using one of the tags could find it). I'm a little leary of the recommended solution being renaming the file back to the bad name. We certainly don't want anyone doing that on an E3SM supported machine. Would it be reasonable to tell them to use the maint corresponding maint branch instead of the tag they're using?

xylar commented 1 year ago

I'm going to rerun the ocean and sea-ice partition generation tomorrow using https://github.com/MPAS-Dev/compass/pull/563. I looked carefully at all of the EC30to60E2r2 partition files for both ocean and sea ice that we currently have, and I have an algorithm that generates most of them (except the ones with large prime factors) as well as many others that could feasibly be useful -- 377 in total. I'll do the same for the WC14to60E2r5 directories.

xylar commented 1 year ago

Thanks to @amametjanov for his very helpful feedback on what partition sizes are needed.

rljacob commented 1 year ago

Mention the maint option but also is there something they can edit to use the new file while keeping the rest of the tagged code?

xylar commented 1 year ago

@rljacob, yes, I think we can make a suggestion on how to edit the code that would be equivalent to the changes we will make to point to the new file locations.

xylar commented 1 year ago

I realized that the name of the mesh changing means the name of the tests and grids that use that mesh will change. I believe this would qualify as a non-bit-for-bit change for purposes of testing.

vanroekel commented 1 year ago

@xylar and @rljacob just so I understand what the proposal is here. I think that to address this issue, the following should be done

  1. Create the *****.README file for the files that moved to the r3 directory and place in current mesh directories. In this it will explain how to patch the tagged code
  2. Issue a nonBFB PR to master for changing the mesh paths (creating the EC30to60E2r3 grid) and updating the tests
  3. PRs should be made to maint2.0 and maint2.1 with the code corrected to the new file locations

I assume the same should be done for #5520?

Please let me know if I’m not tracking the discussion correctly.

xylar commented 1 year ago

@vanroekel, that fits my understanding.

rljacob commented 1 year ago

The grid alias doesn't have to be updated. Why don't you leave it alone for one or 2 testing cycles so we can verify bfb. Then change the alias (which would change the test name).

xylar commented 1 year ago

I think that sounds reasonable. There will be some logistical questions about how to do that. I guess we can figure it out more easily in the PR.

xylar commented 1 year ago

@vanroekel, I have all of the graph partition files (and other compass-produced support files) for the EC30to60E2r3 and WC14to60E2r5 meshes is place. As far as I'm concerned, you can proceed with PRs when you're ready. We'll need to follow up with @jonbob and @rljacob to make sure all the necessary files for coupling are symlinked.