Open vanroekel opened 1 year ago
@jonbob @xylar @milenaveneziani Here is the issue we discussed on slack. After thinking about this I'm thinking we should make the EC30to60E2r3 folder and make this as clean as possible but would appreciate your input.
@njeffery @maltrud Given the date stamp of your BGC file I wanted to make you aware of this as well as you likely have GEBCO bathymetry as well
@xylar I'm also thinking the attributes of the 210210 IC should be modified to specify the bathymetry used as done in COMPASS. Do you think I could possibly use the files_for_e3sm portion of compass to add the attributes to this? I'm guessing I could also just use ncatted as well potentially.
Also pinging @golaz for any input on the plan to move this specified initial condition to a new EC30to60E2r3 directory.
Also just confirming here that the v2 and v2.1 LR simulation campaigns used the mesh referenced here.
@xylar I'm also thinking the attributes of the 210210 IC should be modified to specify the bathymetry used as done in COMPASS. Do you think I could possibly use the files_for_e3sm portion of compass to add the attributes to this? I'm guessing I could also just use ncatted as well potentially.
I need to check if files_for_e3sm
from compass can take care of this. It will be necessary to rerun files_for_e3sm
regardless to produce the full set of diagnotics and analysis member files for the new mesh name.
Also just confirming here that the v2 and v2.1 LR simulation campaigns used the mesh referenced here.
Luke, just to clarify: the v2 and v2.1 LR simulation campaigns did not use the mesh and bathymetry references here, since the default initial condition file for EC30to60E2r2 has the 201001
time stamp on it.
@milenaveneziani and @vanroekel, I think the reason that the LR simulation campaigns did not use this mesh is because they are likely starting from the "spun up" initial condition, which was not updated in #4091.
yes, indeed: that was what I meant by default initial condition.
@xylar: what is the rationale in using the r#
in our mesh names, can you please remind me? it refers to any revision in mesh details, including bathymetry? In that case, then creating the EC30to60E2r3 folder seems like the perfect the solution here.
Yes, the revision r#
is for any change in the mesh or initial condition. So we created different revisions for different bathymetry, different vertical grid, but also we would have had a different revision for the EN4 1900 initial condition (which we never actually used).
Now I'm confused about what used which file. The very first input to the v2/v2.1 sim campaign -- https://acme-climate.atlassian.net/wiki/spaces/EWCG/pages/2781544502/20210518.v2rc1a.Frazil3.ne30pg2+EC30to60E2r2.chrysalis -- should have used the EC30to60E2r2 non spun up, well spun up very briefly, and given it has a tag after #4091 I would have guessed it used the new bathymetry. But I can check the restart files to confirm.
For v2.1: v2_1.LR.piControl
is a hybrid from 20221012.submeso.piControl.ne30pg2_EC30to60E2r2.chrysalis
, which in turn restarted from your summer simulation 20220715.submeso.piControl.ne30pg2_EC30to60E2r2.chrysalis
. And that last simulation started from EC30to60E2r2/mpaso.EC30to60E2r2.rstFromG-anvil.201001.nc
. So v2.1 for sure used the old bathymetry.
You are right. I tracked down the wrong path. I also just confirmed the v2.1 and v2 bathymetry is consistent with the old files. That makes this change simpler (no need for PRs to maint2 and maint2.1). Thanks for the explanation!
Should v3 sims use this new bathymetry?
@rljacob, yes, or actually I think we should generate a new set of meshes for E3SM v3 (which will have the new bathymetry). The E2
in the mesh name explicitly indicates that it's for E3SM v2.
I agree with Xylar completely. We definitely want the bathymetry to be updated but with the E3 tag indicating a new series of meshes.
@vanroekel, what is the situation with the update initial conditions for WC14 in https://github.com/E3SM-Project/E3SM/pull/4091? Do you think that could have the same issue? If so, we should make a separate E3SM issue for that as well.
To add to the detective work, it seems that @vanroekel updated the spun-up initial conditions to the new version of the mesh in #4091 but then the spun-up initial conditions (but not the ones without spin up) were set back to the earlier version in #4127.
@xylar thanks for the continued detective work on sorting this out. I've dug around a bit as well and here is my summary of what is updated and what is not
EC30to60E2r2 -- both of the ICG files in the input data server use the ETOPO bathymetry,
mpaso.EC30to60E2r2-1900.rstFromG-anvil.200910.nc
and
mpaso.EC30to60E2r2-1900.rstFromG-anvil.201001.nc
have the same bathymetry
but the cold start ocean ICs
ocean.EC30to60E2r2.210210.nc
and
ocean.EC30to60E2r2.200908.nc
have different bathymetry (but the same horizontal mesh) This means EC30to60E2r2 G-cases run after #4091 have the different bathymetry
For the WC14
The non spun up ICs
ocean.WC14to60E2r3.200714.nc
and
ocean.WC14to60E2r3.210210.nc
use different bathymetry
there are 3 spun up conditions
mpaso.WC14to60E2r3.rstFromG-anvil.201002.nc - pre #4091
mpaso.WC14to60E2r3.rstFromG-anvil.210210.nc -- post #4091
mpaso.WC14to60E2r3.rstFromG-anvil.210226.nc -- post #4127
for these the last two use different bathymetry than the first. I'll check if the production WC14 fully coupled RRMs used post #4091, but will also make a separate issue and report that finding there.
Based on these findings I think for the EC30to60E2r2 we need to move the 210210 datestamp to an 'r3' folder and generate a new spun up condition and regenerate mask/transect files. Although I think the transect files should be okay?
If coastline hasn't changed, then the transect files are all good.
I have a question related to the Gebco bathymetry version: is there some metadata that indicates it? I wonder because we have updated the Gebco bathymetry last summer (the previous version was weird in the Chukchi Sea), but from the mentioned time stamps, it seems to me that the initial conditions that we are discussing here use the old(er) Gebco (pre-July 2022)?
Thanks for that clarification @milenaveneziani that makes sense.
Unfortunately no on the metadata. It appears I neglected to use the files_for_e3sm script in making the IC so the metadata is not there. I made the update to that file here https://github.com/MPAS-Dev/MPAS-Model/pull/669 but it did not get used for the E3SM ICs. But given the dates I would agree it is the older Gebco data.
The dates will be sufficient. Maybe we can add a note somewhere in all $meshversion/ directories that use this bathymetry, just to make sure we remember.
Pre move of the affected IC's I'm going to try run the files through the files_for_e3sm part of compass to get the right metadata in the files and we could note this bathymetry there.
Pre move of the affected IC's I'm going to try run the files through the files_for_e3sm part of compass to get the right metadata in the files and we could note this bathymetry there.
I have already done this but I can rerun (it takes about half an hour). I can add a note about the version of GEBCO and the Chukchi Sea.
I will go ahead and make the EC30to60E2r3 directory and populate it so that we can make the corresponding PR.
I believe I have all the ocean and sea ice files in place in the following locations: https://web.lcrc.anl.gov/public/e3sm/inputdata/ocn/mpas-o/EC30to60E2r3/ https://web.lcrc.anl.gov/public/e3sm/inputdata/ice/mpas-seaice/EC30to60E2r3/ https://web.lcrc.anl.gov/public/e3sm/diagnostics/maps/ https://web.lcrc.anl.gov/public/e3sm/diagnostics/mpas_analysis/maps/ https://web.lcrc.anl.gov/public/e3sm/diagnostics/mpas_analysis/region_masks/
I believe permissions are also correct. @vanroekel, would you be willing to make a PR to update E3SM itself accordingly? I believe we will also need @jonbob to copy some domain and mapping files.
Notes:
mpaso
and mpassi
as produced by files_for_e3sm
. If we want these to be ocean.
and seaice
, we should change these in files_for_e3sm
so we don't have to manually rename, but I personally think the more specific names are better.One more note: It seems that files_for_e3sm
produces substantially fewer partition files than @jonbob has made in the past. I have asked on Slack for @jonbob to provide me with the logic he uses to determine this list of core counts so I can make files_for_e3sm
match. I will rerun after that to get the missing partition files and add them.
The same for #5520.
Shoot, LCRC has maintenance today so we're going to have to put this work on hold until tomorrow.
Been sort of following this. What is the purpose of the new EC30to60E2r3 directory?
@rljacob, some of the files included in the current EC30to60E2r2 directory are actually for a different bathymetry (therefore a different ocean mesh by the naming conventions that we adopted for E3SMv2 and beyond). @vanroekel accidentally gave them the same name as the existing EC30to60E2r2 but they should have been a new revision, EC30to60E2r3. We are working to correct this now.
But the "E2" stands for E3SMv2 right? What v2 run would ever use this? 2.1 and 2.0 have to keep using what they've been using.
There will need to be PRs to patch those maintenance branches, I believe. It's massively confusing to have different runs that appear to have the same mesh but don't actually. So we'd definitely want to fix this for any new runs that folks start. There's very little we can do about runs already started but fortunately most of them seem to have started with an initial condition that was correctly named.
Do the answers change if I try to rerun v2.0 and v2.1 with these new files?
No, they are just new names (and new metadata) for existing files.
Ok good. For the files with misleading or incorrect metadata, it would be ok to use ncks to add a global attribute that explains things (without changing the filename). I like that better than a separate README file that can easily be ignored.
We will change the metadata but the filenames themselves are terribly misleading. So I'm not okay with keeping them as they are. They don't belong in the directory where they currently are.
I would at least leave a softlink from the old filenames to the new one. Otherwise won't runs with the v2.0.0 and v2.1.0 tags break?
Isn't the point of the maintenance branches that folks should be running with those instead of the tags themselves? We could keep symlinks but that somewhat defeats the purpose of fixing the issue, since the problem arose because folks were trying to use one initial condition as the mesh file for analyzing simulation results that used the other initial condition. We might hope that folks would notice that the there's a symlink but it would be very easy to repeat the same mistake as long as the incorrect file is there.
The problem partly arises because the initial condition is not linked to the simulation itself. If it were, it would be very easy to find the right initial condition (and therefore mpas mesh) for a given simulation. As it is, we have to either find a restart file (sometimes easy, sometimes not) or we have to guess which initial condition was used. In the latter case, if we get it wrong and we have the wrong mesh, that's a big problem because the analysis can be badly wrong as a result.
I fully agree with @xylar that leaving the files or a putting in a link defeats the purpose of the fix. I believe both the fix to this issue and #5520 will require PRs to master, maint2.0 and maint2.1 where the latter will point the appropriate comp sets back to the files in the new EC30to60E2r3 directory. With those PRs I don't see the need to keep the soft link, new runs would point to the new directory AND be BFB with previous runs.
But perhaps I misunderstand though as the v2.0 and 2.1 tags would break, but I thought the practice was to run simulations of v2.0 and v2.1 from maint2.0 and 2.1 respectively. I did not know we guaranteed v2.0/2.1 tags will continue to work.
Its true that recommended practice is to use maint branches for new 2.0 or 2.1 runs. And the tagged versions will all eventually break as their machine files go out-of-date. But I think we want to not purposefully break tagged versions this early after making them since we might need to run them (and versions on the maint branch prior to this change) in a bisect for example.
How about this, replace each bad file with "filename.README" and explain there why the file was removed and what could be used in its place (by renaming) for a run with an old version.
@rljacob, I think what you suggest is right. We could add an ocean.EC30to60E2r2.210210.nc.README
in place of ocean.EC30to60E2r2.210210.nc
that explains the problem (and hopefully someone using one of the tags could find it). I'm a little leary of the recommended solution being renaming the file back to the bad name. We certainly don't want anyone doing that on an E3SM supported machine. Would it be reasonable to tell them to use the maint corresponding maint
branch instead of the tag they're using?
I'm going to rerun the ocean and sea-ice partition generation tomorrow using https://github.com/MPAS-Dev/compass/pull/563. I looked carefully at all of the EC30to60E2r2 partition files for both ocean and sea ice that we currently have, and I have an algorithm that generates most of them (except the ones with large prime factors) as well as many others that could feasibly be useful -- 377 in total. I'll do the same for the WC14to60E2r5 directories.
Thanks to @amametjanov for his very helpful feedback on what partition sizes are needed.
Mention the maint option but also is there something they can edit to use the new file while keeping the rest of the tagged code?
@rljacob, yes, I think we can make a suggestion on how to edit the code that would be equivalent to the changes we will make to point to the new file locations.
I realized that the name of the mesh changing means the name of the tests and grids that use that mesh will change. I believe this would qualify as a non-bit-for-bit change for purposes of testing.
@xylar and @rljacob just so I understand what the proposal is here. I think that to address this issue, the following should be done
I assume the same should be done for #5520?
Please let me know if I’m not tracking the discussion correctly.
@vanroekel, that fits my understanding.
The grid alias doesn't have to be updated. Why don't you leave it alone for one or 2 testing cycles so we can verify bfb. Then change the alias (which would change the test name).
I think that sounds reasonable. There will be some logistical questions about how to do that. I guess we can figure it out more easily in the PR.
@vanroekel, I have all of the graph partition files (and other compass-produced support files) for the EC30to60E2r3
and WC14to60E2r5
meshes is place. As far as I'm concerned, you can proceed with PRs when you're ready. We'll need to follow up with @jonbob and @rljacob to make sure all the necessary files for coupling are symlinked.
The current ocean ice condition changed in #4091 had a change in bathymetry from ETOPO to GEBCO and should have not been placed in the EC30to60E2r2 directory of the input data folder. The revision number was bumped here https://github.com/MPAS-Dev/MPAS-Model/pull/669 but I did not propagate that to E3SM. Mixing bathymetry in the same directory leads to confusion and issues in analysis scripts.
The ocean and ice meshes date stamped 210210 should be moved to an 'r3' folder and compsets that used the EC30to60E2r2 mesh with that date should be updated to the r3 version after moving the input file.