NCAR / LMWG_dev

Repository to track LMWG development simulations
3 stars 0 forks source link

ERA5-SP: ctsm51d142_f19_ERA5_[1850, hist] #21

Open wwieder opened 12 months ago

wwieder commented 12 months ago

Description:

Assess land model results from and SP cased forced with ERA5 datm. I'm inclined to do this with a HIST compset that's been spun up for a bit first? I also think we could do this with with a 1 or 2 degree FV grid (f09... or f19...)

Same as https://github.com/NCAR/LMWG_dev/issues/18 but using ERA5 instead of GSWP3V1

Spin-up: ./create_newcase --compset I1850Clm51Sp --res f19_f19 --case ~/cases_LMWG_dev/ctsm51d142_f19_GSWP3V1_1850 --run-unsupported Hist: ./create_newcase --compset IHistClm51Sp --res f19_f19 --case ~/cases_LMWG_dev/ctsm51d142_f19_GSWP3V1_hist --run-unsupported


Case directory: Spin-up: /glade/u/home/slevis/cases_LMWG_dev/ctsm51d142_f19_ERA5_1850 Hist part 1: /glade/u/home/slevis/cases_LMWG_dev/ctsm51d142_f19_ERA5_hist.1850-1919 Hist: /glade/u/home/slevis/cases_LMWG_dev/ctsm51d142_f19_ERA5_hist hist directory is a copy of "hist part 1" with updates to user_nl_clm and env_run.


Sandbox: /glade/work/slevis/git/latest_master git describe: ctsm5.1.dev142


usernl changes: spin-up: NONE hist part 1:

finidat = '/glade/scratch/slevis/archive/ctsm51d142_f19_ERA5_1850/rest/0021-01-01-00000/ctsm51d142_f19_ERA5_1850.clm2.r.0021-01-01-00000.nc'
use_init_interp = .true.

hist: back to NONE; change env_run to hybrid starting in 1920


SourceMods: NONE


Diagnostics: Diags (if available)

https://webext.cgd.ucar.edu/I1850/$CASE/lnd/


Output: Output (if still available): /glade/scratch/slevis/archive/ctsm51d142_f19_GSWP3V1_1850/ /glade/scratch/slevis/archive/ctsm51d142_f19_GSWP3V1_hist/


Contacts: @olyson @slevis-lmwg


Extra details: Spin-up phase: 20 years with DATM_YR_START 1940 and DATM_YR_END 1949. Hist part 1: 1850-1949 with DATM_YR_START 1940, DATM_YR_END 1949, and DATM_YR_ALIGN 1940. Hist: 1950-2014 with DATM_YR_START 1940, DATM_YR_END 2014, and DATM_YR_ALIGN 1940.

slevis-lmwg commented 11 months ago

I tried starting the spin-up with <entry id="DATM_MODE" value="ERA5"> but failed with this error: PET0000 src/addon/NUOPC/src/NUOPC_Base.F90:2103 Invalid argument - Sa_q2m is not a StandardName in the NUOPC_FieldDictionary

datm.streams.xml contains other Sa_ variables, I don't see Sa_q2m. But it also doesn't seem to be pointing to the 1940-1949 data. Instead it's pointing to 2019 data located in /glade/p/cesmdata/cseg/inputdata/atm/datm7/ERA5

Does that seem wrong? Should I be pointing to data in /glade/p/cgd/tss/CTSM_datm_forcing_data/atm_forcing.datm7.ERAI.0.5d.c141028? Note that in this directory the data range 1979-2014 rather than starting in 1940.

I will put this aside until we discuss.

Update: Check with Sean S and/or Adam H.

slevis-lmwg commented 11 months ago

Sean's responses: I have used era5 data, but only for regional cases. I believe these data are 0.25 degree and hourly, which makes them something like an order of magnitude larger than our other datm datasets. When I tried creating global files, I found that my script to convert the raw data to ctsm three-stream data took too long, and I haven't tried it recently to see if I could speed things up. I think if the script used xarray and dask to chunk the data, it could be made faster, but I haven't used dask for anything.

...and

Normally, it doesn't take that long to create the files, as it is mostly a matter of creating netcdf files with the appropriate variable names, metadata, etc. So maybe it's just that my script is using too much memory, and it could be improved in that regard. It does prompt the question of how much space the global era5 data will take up.

Adam's and Isla's response for the record but less promising than Sean's, I think: We do have native model level data for ERA5 at /glade/collections/rda/data/ds633.6/e5.oper.an.ml/. We have q, t, u, v, w on model levels there but only going back to 1979. Then there are other single level fields here... /gpfs/fs1/collections/rda/data/ds633.0/e5.oper.an.sfc. I'm also not aware of this being used as DATM forcings to drive the land model.

wwieder commented 11 months ago

hmm maybe ask Steve Yager? I think they're using ERA to initialize the ocean model for ESPWG simulations. Presumably they have the same input data needs that we do? I just dropped by his office, but he wasn't there.

slevis-lmwg commented 11 months ago

Adrien Damseaux (adamseau@awi.de) mentioned in his talk that he also used ERA5 1980(?)-2021(?)

wwieder commented 11 months ago

I noticed that too, but just over the Arctic. I'd also imagine this wasn't done on the NCAR machines, but we can ask?

slevis-lmwg commented 11 months ago

Sent emails to Steve and Adrien.

Update: Steve and his collaborators will likely focus on this starting late November 2023.

slevis-lmwg commented 11 months ago

Adrien's response:

Yes, I do have global ERA5 DATM on my cluster:

1.2T    Precip
1.2T    Solar
4.5T    TPHWL

It's quite big, how can I share it with you?

I did not ask him the year coverage, but this seems promising. What's the recommended method of transferring large amounts of data? Globus?

wwieder commented 11 months ago

I think so, maybe ask Gary Strand for recommendations if Globus isn't straightforward to figure out.

slevis-lmwg commented 11 months ago

Update:

slevis-lmwg commented 11 months ago

Notes for when I get back to this: Steve Yeager pointed me to the ERA5 data here: /glade/campaign/collections/rda/data/ds633.0/e5.oper.an.sfc/194001 ...to /202112 This looks already somewhat processed to me, so I may not be at the same starting point as Adrien, and his cdo manipulations may or may not be relevant. Sean emailed me the list of variables to convert.

slevis-lmwg commented 8 months ago

Update 1st and 2nd attempts in late 2023: On cheyenne with dev142 using ERA5 test files from Taydra Low (UW-Madison). We found issues with the test files that we hope to resolve with a new test file from Taydra.

3rd attempt: Using dev159 that I know works on derecho, I get this error:

# of NaNs =            1
Which are NaNs =  F F T
NaN found in field Sl_t at gridcell index            3

Looking at the file, I have not spotted nans anywhere, but I pursued these possible culprits: 1) DID NOT HELP: Remove negative u values, which are intentional 2) THIS appeared to get me past the error by changing _FillValue and missing_value from nan to 1e36 as in this script: /glade/campaign/cesm/cesmdata/inputdata/atm/datm7/atm_forcing.datm7.era5.c231130/modify_attributes.sh However, I ended up with a corrupted time variable, which raised a different error, which I fixed, and got back to the original error, so now I don't trust this step of changing these attributes from nan to 1e36. 3) THIS could be a problem: That most variables have add_offset and scale_factor attributes. Does the CLM know what to do with those?

TODO later: Change /glade/work/slevis/git/latest_master/components/cdeps/datm/cime_config/stream_definition_datm.xml:

@@ -517,7 +517,8 @@
     </stream_datafiles>
     <stream_datavars>
       <var>TBOT     Sa_tbot</var>
-      <var>WIND     Sa_wind</var>
+      <var>u        Sa_u_af</var>
+      <var>v        Sa_v_af</var>

so as to use both the u and v components of the wind. Currently this leads to an error. To move forward with testing, I pretend for now that the u component is all the wind. Revisit this later.