E3SM-Project / CMIP6-Metadata

0 stars 0 forks source link

Add v2 template #8

Closed chengzhuzhang closed 2 years ago

chengzhuzhang commented 2 years ago

Hey Tony @TonyB9000, I just added a template for v2, would you please follow this to populate meta definition files for other v2 simulations? The experiment_id and activity mapping can be found https://wcrp-cmip.github.io/CMIP6_CVs/docs/CMIP6_experiment_id.html

TonyB9000 commented 2 years ago

I'm trying to figure out where I should "pull" this material. I guess I can pull to the e3sm/staging/resource location. Should there be a separate metadata files per experiment? Per ensemble? Or per dataset (one for atmos mon, one for atmos day, one for ocean mon, etc)?

chengzhuzhang commented 2 years ago

Hey Tony, I think you already has a directory set up for this directory. metadata files are per experiment and per ensemble, for datasets from all realms.

TonyB9000 commented 2 years ago

Well, I'm sitting in /p/user_pub/e3sm/staging/resource/CMIP6-metadata/. It contains:

drwxrwxr-x. 2 bartoletti1 publishers 4096 May 20 10:14 E3SM-1-0 drwxrwxr-x. 2 bartoletti1 publishers 4096 Apr 19 13:38 E3SM-1-1 drwxrwxr-x. 2 bartoletti1 publishers 4096 Apr 19 13:38 E3SM-1-1-ECA -rw-rw-r--. 1 bartoletti1 publishers 234 Apr 19 13:38 README.md -rw-rw-r--. 1 bartoletti1 publishers 2705 Apr 19 13:38 template.json drwxrwxr-x. 2 bartoletti1 publishers 4096 Apr 19 13:38 test

I did a "git pull" and it said

remote: Enumerating objects: 11, done. remote: Counting objects: 100% (11/11), done. remote: Compressing objects: 100% (3/3), done. remote: Total 5 (delta 2), reused 5 (delta 2), pack-reused 0 Unpacking objects: 100% (5/5), 747 bytes | 19.00 KiB/s, done. From https://github.com/E3SM-Project/CMIP6-Metadata

But I would expect to see a "E3SM-2-0" directory. Perhaps I should blow it all away and do a "git clone"?

Also: (base) -bash-4.2$ git branch

chengzhuzhang commented 2 years ago

My bad.. I failed to add the new file. Please pull again..

TonyB9000 commented 2 years ago

What should I do when "git pull" says

From https://github.com/E3SM-Project/CMIP6-Metadata 5eb5a6f..eb586c2 add_v2 -> origin/add_v2 There is no tracking information for the current branch. Please specify which branch you want to merge with. See git-pull(1) for details.

git pull <remote> <branch>

(what does it want for "remote" and for "branch". I assume "add_v2" is the branch...

chengzhuzhang commented 2 years ago

Not sure what happened. Try git fetch again?

TonyB9000 commented 2 years ago

This is doing something:

git pull https://github.com/E3SM-Project/CMIP6-Metadata add_v2

says

From https://github.com/E3SM-Project/CMIP6-Metadata

  • branch add_v2 -> FETCH_HEAD Updating 618f8de..eb586c2
TonyB9000 commented 2 years ago

Finally - there is E3SM-2-0/historical_r1i1p1f1.json

TonyB9000 commented 2 years ago

When I list the 368 "phase 1" v2 dataset_ids, cut to just the "experiment + ensemble" fields (and sort/uniq), I get 34 results:

1pctCO2.ens1 abrupt-4xCO2.ens1 abrupt-4xCO2.ens2 amip.ens1 amip.ens2 amip.ens3 hist-aer.ens1 hist-aer.ens2 hist-aer.ens3 hist-aer.ens4 hist-aer.ens5 hist-all-xGHG-xaer.ens1 hist-all-xGHG-xaer.ens2 hist-all-xGHG-xaer.ens3 hist-all-xGHG-xaer.ens4 hist-all-xGHG-xaer.ens5 hist-GHG.ens1 hist-GHG.ens2 hist-GHG.ens3 hist-GHG.ens4 hist-GHG.ens5 historical.ens1 historical.ens2 historical.ens3 historical.ens4 historical.ens5 piClim-control.ens1 piClim-histaer.ens1 piClim-histaer.ens2 piClim-histaer.ens3 piClim-histall.ens1 piClim-histall.ens2 piClim-histall.ens3 piControl.ens1

I assume I need to convert (say) "piClim-histaer.ens3: to "piClim-histaer_r3i1p1f1", etc. to name the 34 metadata files.

chengzhuzhang commented 2 years ago

I guess it is not necessary to rename, as long as feeding the correct name to e3sm_to_cmip hist-all-xGHG-xaer.* can be left out since we won't publish.

TonyB9000 commented 2 years ago

Forgot about hist-all-xGHG-xaer, thanks. Renaming is not the issue - I need to discover what elements of the metadata are specific to each experiment/ensemble. I'll get there.

TonyB9000 commented 2 years ago

OK - we now have 29 differently-named v2 metadata files. I checked on the v1 historical files to see how they differ across ensembles:

(base) -bash-4.2$ diff historical_r1i1p1f1.json historical_r2i1p1f1.json 14c14 < "realization_index": "1",

"realization_index": "2", 40c40 < "branch_time_in_parent": 36500.0,

"branch_time_in_parent": 54750.0, (base) -bash-4.2$ (base) -bash-4.2$ (base) -bash-4.2$ diff historical_r1i1p1f1.json historical_r3i1p1f1.json 14c14 < "realization_index": "1",

"realization_index": "3", 40c40 < "branch_time_in_parent": 36500.0,

"branch_time_in_parent": 73000.0, (base) -bash-4.2$ (base) -bash-4.2$ (base) -bash-4.2$ diff historical_r1i1p1f1.json historical_r4i1p1f1.json 14c14 < "realization_index": "1",

"realization_index": "4", 40c40 < "branch_time_in_parent": 36500.0,

"branch_time_in_parent": 91250.0, (base) -bash-4.2$ (base) -bash-4.2$ (base) -bash-4.2$ diff historical_r1i1p1f1.json historical_r5i1p1f1.json 14c14 < "realization_index": "1",

"realization_index": "5", 40c40 < "branch_time_in_parent": 36500.0,

"branch_time_in_parent": 109500.0,

So other than "realization_index", they have different (and increasing) "branch_time_in_parent" values. How should I accommodate this in the v2 sets?

Next, I compared the v1 historical to the v1 piControl:

(base) -bash-4.2$ diff historical_r1i1p1f1.json piControl_r1i1p1f1.json 8c8 < "experiment_id": "historical",

"experiment_id": "piControl", 26c26 < "parent_experiment_id": "piControl",

"parent_experiment_id": "piControl-spinup", 40c40 < "branch_time_in_parent": 36500.0,

"branch_time_in_parent": 0.0, 70c70 < "history": "",

"history": "Output from 20180129.DECKv1b_piControl.ne30_oEC.edison. compset = A_WCYCL1850S_CMIP6", 72c72 < "comment": "",

"comment": " piControl was configured to adhere as closely as possible with the CMIP6 DECK specifications (Eyring at al. 2016, GMD) with prescribed forcings appropriate for 1850 conditions. The simulation was run with time invariant forcings for a total of 500 years. To reduce spin-up time of the deep ocean, the simulation was initialized from a series of pre-existing control simulations performed with developmental versions of E3SM v1 as part of the final tuning phase (approximately 400 years). The final model tuning consisted of minor adjustments to cloud parameters with the objectives of achieving (1) near zero net top-of-atmosphere (TOA) radiation balance and (2) stable global mean surface air temperature.",

SOMEWHERE there needs to be a "metadata_guide" document that explains exactly which metadata fields vary ONLY with ensemble, which vary with experiment, which vary with simulation model, and which remain constant.

TonyB9000 commented 2 years ago

Fixed all historical (realization_index and branch_time_in_parent, and history) and fixed experiment_id in all hist-aer.

Sorry for the oversight. I thought I'd caught everything. I wrote a small "check_values.sh" to make it easier to spot issues.

TonyB9000 commented 2 years ago

Shall I merge?

chengzhuzhang commented 2 years ago

Not yet, would you please revert the license to old v1 license for v1 material? so Undo things in this commit update license info for all v1 user metadata

TonyB9000 commented 2 years ago

This will be tricky - I need to revert the "add_v2" branch to a previous state (for v1 stuff) and not lose all the v2 edits.

I'll see what I can do - I need to revert to obtain the old license info.

(I'm sure there's a git-way to revert specific files.)

chengzhuzhang commented 2 years ago

I think just to add a new commit to change license texts if that's easier.

TonyB9000 commented 2 years ago

If you have the old v1 files, that would be easier. If you can copy the "old" E3SM-1-* metadata files to their respective locations under e3sm/staging/resource/CMIP6-metadata, I can simply add them to a commit. But if you have an open repo with teh old v1 files, that would be fine to just re-commit them yourself, and upon merge, resolve the conflicts by taking your v1 files.

I see now that I should have created a separate branch for "update_v1_licenses", rather than mix that up with the "add_v2" (metadata).

Alternately, I can re-edit the v1 files, but I need a copy of the old license info.

TonyB9000 commented 2 years ago

My recent commit history:

commit 6ee6556443b5aa60a710223c55796cc145068b40 (HEAD -> add_v2, origin/add_v2) Author: Tony Bartoletti bartoletti1@llnl.gov Date: Fri Jul 29 11:37:56 2022 -0700

fixed misfire on realization_index

commit 62a8d627b0d4566ec167b2bda212e22ea3182737 Author: Tony Bartoletti bartoletti1@llnl.gov Date: Fri Jul 29 11:12:20 2022 -0700

fix overlooked values in historical and hist-aer metadata

commit 7fc5f2ccdf3cfa18c92727fc84d240aa7976dfb1 Author: Tony Bartoletti bartoletti1@llnl.gov Date: Thu Jul 28 14:41:39 2022 -0700

update license info for all v1 user metadata
TonyB9000 commented 2 years ago

I guess I can always find the old license info in old published files.

I think there is a way to "git stash" all current state, revert everything to "pre-v1-license edits", and then "git stash pop" only selected files (the v2 files). I'll investigate.

chengzhuzhang commented 2 years ago

This PR is superseded by #9.