ESCOMP / CTSM

Community Terrestrial Systems Model (includes the Community Land Model of CESM)
http://www.cesm.ucar.edu/models/cesm2.0/land/
Other
302 stars 307 forks source link

Update soils data used for surface dataset #1303

Closed wwieder closed 1 year ago

wwieder commented 3 years ago

It would be nice to update the soils data we're using to generate the surface dataset to something from this century. This will introduce a number of answer changes to the code, but it seems worth having a discussion about what we need here.

@dlawrenncar suggested using SoilGrids data, which just released a version 2.0 of their dataset https://doi.org/10.5194/soil-2020-65. SoilGrids2.0 contains information on soil texture, OC content, pH, bulk density, coarse fragments, CEC, and soil N at 250 m resolution for 6 soil layers (0-200 cm). This high resolution data also includes uncertainty estimates! According to the data providers, v2.0 has changed significantly from previous releases of the dataset, but is currently only available at 250m resolution.

Laura Poggio and Niels Batjes at ISRIC are interested in and willing to provide a coarser resolution data product for our purposes and wondered what we wanted. I've basically told them we'd like the whole dataset, but to prioritize texture and soil C information. Is a 5km data product adequate for NWP applications, but not too unwieldy for climate simulations? Do we need 1km resolution mapping flies?

I also wondered if we should think about how to generate soil properties for the hillslope model? Does this happen in our own tool chain, or could it be generated in the mapping files from ISRIC? This is likely of secondary concern, but may be worth discussion?

wwieder commented 2 years ago

It it's easier, it could also be the next SCID (# 2 instead of # 1, assuming # 2 actually has soil data).

mvertens commented 2 years ago

@wwieder @ekluzek @slevisconsulting - I have now generated PCT_SAND and PCT_CLAY using the new soil texture data and using the following raw datasets:

     ! --- for a 5min input mapunit---
     mksrf_fsoitex_mesh = &
          '/glade/p/cesm/cseg/inputdata/lnd/clm2/mappingdata/grids/UNSTRUCTgrid_5x5min_nomask_c200129.nc'
     mksrf_fsoitex_mapunit = &
          '/glade/u/home/mvertens/src/ctsm.new_mksurfdata/tools/mksurfdata_esmf/soiltex_mapunits_4320x2160_c220329.nc'
     mksrf_fsoitex_lookup = &
          '/glade/u/home/mvertens/src/ctsm.new_mksurfdata/tools/mksurfdata_esmf/wise_30sec_v5_lookup.nc'

Can you please have a look at the fields mapunits, PCT_SAND and PCT_CLAY in /glade/work/mvertens/ctsm.new_mksurfdata/tools/mksurfdata_esmf/surfdata_1.9x2.5_hist_78pfts_CMIP6_2000_c220330.nc I did not to use the next SCID for points. Now there are no longer negative values of the PCT_SAND and PCT_CLAY values. Should I set up a meeting to go over this?

wwieder commented 2 years ago

Awesome, thanks @mvertens.

My phone died, so I can't DUO to get onto Cheyenne. For now, I'll assume that the soil texture information looks good. Are there features of your new workflow or the new dataset that we should know about?

If not, next steps may be to move onto the ORGANIC block of the mksurf code. Hopefully this can be simplified from what we're doing now, and I think @dlawrenncar has a better sense of what this can / should look like. Maybe we can have a meeting to debrief on soil texture and then make a plan for ORGANIC

On Wed, Apr 6, 2022 at 9:03 PM mvertens @.***> wrote:

@wwieder https://github.com/wwieder @ekluzek https://github.com/ekluzek @slevisconsulting https://github.com/slevisconsulting - I have now generated PCT_SAND and PCT_CLAY using the new soil texture data and using the following raw datasets: ! --- for a 5min input mapunit--- mksrf_fsoitex_mesh = &

'/glade/p/cesm/cseg/inputdata/lnd/clm2/mappingdata/grids/UNSTRUCTgrid_5x5min_nomask_c200129.nc' mksrf_fsoitex_mapunit = & '/glade/u/home/mvertens/src/ctsm.new_mksurfdata/tools/mksurfdata_esmf/ soiltex_mapunits_4320x2160_c220329.nc' mksrf_fsoitex_lookup = & '/glade/u/home/mvertens/src/ctsm.new_mksurfdata/tools/mksurfdata_esmf/ wise_30sec_v5_lookup.nc'

Can you please have a look at the fields mapunits, PCT_SAND and PCT_CLAY in /glade/work/mvertens/ctsm.new_mksurfdata/tools/mksurfdata_esmf/ surfdata_1.9x2.5_hist_78pfts_CMIP6_2000_c220330.nc

I did not to use the next SCID for points. Now there are no longer negative values of the PCT_SAND and PCT_CLAY values. Should I set up a meeting to go over this?

— Reply to this email directly, view it on GitHub https://github.com/ESCOMP/CTSM/issues/1303#issuecomment-1091028931, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5IWJDHTJHCK4OCN2UAC2DVDZF7PANCNFSM4ZKQKIKQ . You are receiving this because you were mentioned.Message ID: @.***>

dlawrenncar commented 2 years ago

Happy to have a meeting. Do we want to run a quick test using the new soils dataset now? Just to make sure it runs and the results don't look bizarre or anything?

On Thu, Apr 7, 2022 at 6:08 AM will wieder @.***> wrote:

Awesome, thanks @mvertens.

My phone died, so I can't DUO to get onto Cheyenne. For now, I'll assume that the soil texture information looks good. Are there features of your new workflow or the new dataset that we should know about?

If not, next steps may be to move onto the ORGANIC block of the mksurf code. Hopefully this can be simplified from what we're doing now, and I think @dlawrenncar has a better sense of what this can / should look like. Maybe we can have a meeting to debrief on soil texture and then make a plan for ORGANIC

On Wed, Apr 6, 2022 at 9:03 PM mvertens @.***> wrote:

@wwieder https://github.com/wwieder @ekluzek https://github.com/ekluzek @slevisconsulting https://github.com/slevisconsulting - I have now generated PCT_SAND and PCT_CLAY using the new soil texture data and using the following raw datasets: ! --- for a 5min input mapunit--- mksrf_fsoitex_mesh = &

'/glade/p/cesm/cseg/inputdata/lnd/clm2/mappingdata/grids/UNSTRUCTgrid_5x5min_nomask_c200129.nc' mksrf_fsoitex_mapunit = & '/glade/u/home/mvertens/src/ctsm.new_mksurfdata/tools/mksurfdata_esmf/ soiltex_mapunits_4320x2160_c220329.nc' mksrf_fsoitex_lookup = & '/glade/u/home/mvertens/src/ctsm.new_mksurfdata/tools/mksurfdata_esmf/ wise_30sec_v5_lookup.nc'

Can you please have a look at the fields mapunits, PCT_SAND and PCT_CLAY in /glade/work/mvertens/ctsm.new_mksurfdata/tools/mksurfdata_esmf/ surfdata_1.9x2.5_hist_78pfts_CMIP6_2000_c220330.nc

I did not to use the next SCID for points. Now there are no longer negative values of the PCT_SAND and PCT_CLAY values. Should I set up a meeting to go over this?

— Reply to this email directly, view it on GitHub https://github.com/ESCOMP/CTSM/issues/1303#issuecomment-1091028931, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AB5IWJDHTJHCK4OCN2UAC2DVDZF7PANCNFSM4ZKQKIKQ

. You are receiving this because you were mentioned.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/ESCOMP/CTSM/issues/1303#issuecomment-1091655341, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFABYVA3SHMSGTH4MUO33L3VD3F4FANCNFSM4ZKQKIKQ . You are receiving this because you were mentioned.Message ID: @.***>

mvertens commented 2 years ago

@wwieder @dlawren - I compared the output relative to the original surface dataset at this resolution. have a question regarding the values over Greeenland and a few other points. They are now zero. In /glade/p/cesmdata/cseg/inputdata/lnd/clm2/surfdata_map/release-clm5.0.18/surfdata_1.9x2.5_hist_78pfts_CMIP6_simyr2000_c190304.nc they were not. I can easily add ORGANIC. I don't think I can today but I can try to set something up for tomorrow. I think the idea of a quick test is good - but I need to regenerate the surface dataset with all the other variables activated. In my initial testing I just had the view variables mentioned above have reasonable values. So please hold off until I point you to a surface dataset with valid values for all other variables (as an example I did not want to spend time creating all the other fields when I was just interested in PCT_SAND, PCT_CLAY and mapunits).

dlawrenncar commented 2 years ago

I suggest that we do not meet to talk organics until we have established that the new soil texture data is working so as to prevent us getting started on something (organics) when the underlying code may still need to change. I don't propose a detailed investigation. Just a short run in SP mode and compare against a run with the old dataset. Once we have the surface dataset, this should only take a day or so. We could then meet about organics next week perhaps.

On Thu, Apr 7, 2022 at 8:17 AM mvertens @.***> wrote:

@wwieder https://github.com/wwieder @dlawren https://github.com/dlawren - I compared the output relative to the original surface dataset at this resolution. have a question regarding the values over Greeenland and a few other points. They are now zero. In /glade/p/cesmdata/cseg/inputdata/lnd/clm2/surfdata_map/release-clm5.0.18/ surfdata_1.9x2.5_hist_78pfts_CMIP6_simyr2000_c190304.nc they were not. I can easily add ORGANIC. I don't think I can today but I can try to set something up for tomorrow. I think the idea of a quick test is good - but I need to regenerate the surface dataset with all the other variables activated. In my initial testing I just had the view variables mentioned above have reasonable values. So please hold off until I point you to a surface dataset with valid values for all other variables (as an example I did not want to spend time creating all the other fields when I was just interested in PCT_SAND, PCT_CLAY and mapunits).

— Reply to this email directly, view it on GitHub https://github.com/ESCOMP/CTSM/issues/1303#issuecomment-1091799518, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFABYVEQDG3R4KNWE443UTDVD3U6DANCNFSM4ZKQKIKQ . You are receiving this because you were mentioned.Message ID: @.***>

wwieder commented 2 years ago

@mvertens there seem to be a number of points where sand and clay are both 0 throughout the profile. These are mainly under the Greenland ice sheet, but also in parts of the Sahara, and a few other scattered points. I don't think the model will like these zero values.

To address this, I wonder if we should look for other SCIDs that are not missing (which may work for some grids dominated by sand dunes or lakes? Alternatively we can just fill the 0-profiles with constant, non-missing values as done for Antartica and the whole ocean (18% clay & 43% sand)? I'd suspect the later is easier, @dlawrenncar can you see any downsides to this approach?

After @mvertens makes a fully functional updated surface dataset, @olyson can you do a short SP experiment Dave mentioned above?

olyson commented 2 years ago

Yes.

dlawrenncar commented 2 years ago

I think using constant values under the ice sheets is fine. If the ice sheet melts and the soil is exposed, then we will live with whatever soil texture exists there. But, the gaps in Sahara and other non-ice sheet soils with 0 values probably should not be addressed the same way, unless it would be really challenging to come up with something different (like nearest neighbor).

mvertens commented 2 years ago

@dlawren @wwieder - I think it would definitely be challenging to do a nearest neighbor implementation. But I'm happy to talk about that. It would be good to check that indeed the mapunit output in the surface dataset does indeed give rise to zero PCT_SAND regardless of the level or the SCID. I'll send you a pointer to a new surface dataset once its created.

mvertens commented 2 years ago

So looking at the data some more - I think the problem is coming in from how we are handling negative values. From @wwieder I have the following:

What I am trying to do now is for any negative values (e.g. -4) I'm trying to find the first value that is >=0 (note the equal) and in many cases all other values are 0.

I would suggest that what we want is:

wwieder commented 2 years ago

I'm not sure I completely follow here. Currently all of these negative values in the lookup table are being set to _FillValue (-9999). Are the zero's coming in because of how _FillValues are being handled in mksrf or are these the real values in the lookup table?

Are you suggesting that instead of assigning _FillValues to all negative integers we arbitrarily generate soil properties dunes and salt flats? (it seems we're already doing this for ocean and glaciers).

Finally, I wonder if it would be helpful to look at a few of the mapunit values that are generating the zero soil texture grids so we can look at data in the lookup table more easily?

wwieder commented 2 years ago

OK, it seems like the minimum non-missing values for sand and clay in the lookup table being read in from WISE are 5 and 2%, respectively. This makes it seem like the zero texture values on the surface data are being generated by the re-mapping that's being done.

Here's my assumption about what's being done and an idea for moving forward:

Let me know if retaining information about negative soil texture for glaciers, dunes, etc would be helpful in working through this?

mvertens commented 2 years ago

@wwieder @ekluzek @slevisconsulting - I think a quick meeting to discuss this would be helpful. I've sent an invite for this afternoon.

mvertens commented 2 years ago

@wwider - I could not use your /glade/scratch/wwieder/wise_30sec_v1/WISE30sec/Interchangeable_format/wise_30sec_v1_lookup.nc. The problem is that its still not in the right format. $ nccopy -k cdf5 ./wise_30sec_v1_lookup.nc ./wise_30sec_v2_lookup.nc NetCDF: Not a valid data type or _FillValue type mismatch Location: file nccopy.c; line 1410 I had my own copy of the file that had a fill value of 0 for PCT_SAND and PCT_CLAY. Could you please update your lookup file so that it is cdf5 compatible.

jedwards4b commented 2 years ago

@wwieder I can help with that if you like.

wwieder commented 2 years ago

That would be awesome, @jedwards4b.

I think the issue is that I'm defining the fill value as -999, but then forcing some output to be written out as float and others as int formats.

The resulting .nc file has odd formats for the fill values of the variables that are floats /glade/scratch/wwieder/wise_30sec_v1/WISE30sec/Interchangeable_format/wise_30sec_v1_lookup.nc.

The code I'm using is here https://github.com/wwieder/ctsm_py/blob/master/notebooks/tiff2nc-WISE.ipynb

and on glade /glade/u/home/wwieder/python/ctsm_py/notebooks/tiff2nc-WISE.ipynb

jedwards4b commented 2 years ago

Try using argument format="NETCDF3_64BIT" in your to_netcdf calls. If that doesn't work you may want to consider replacing h5netcdf with netcdf4.

slevis-lmwg commented 2 years ago

Hi Mariana,

I was working on other things and didn't read this email until now. I haven't received an invite for this afternoon, yet.

Sam

On Fri, Apr 8, 2022 at 8:09 AM mvertens @.***> wrote:

@wwieder https://github.com/wwieder @ekluzek https://github.com/ekluzek @slevisconsulting https://github.com/slevisconsulting - I think a quick meeting to discuss this would be helpful. I've sent an invite for this afternoon.

— Reply to this email directly, view it on GitHub https://github.com/ESCOMP/CTSM/issues/1303#issuecomment-1092973828, or unsubscribe https://github.com/notifications/unsubscribe-auth/AINPYFEZSE52FG4HFHL47V3VEBD37ANCNFSM4ZKQKIKQ . You are receiving this because you were mentioned.Message ID: @.***>

mvertens commented 2 years ago

Hi Sam,

I was sure I invited you. You should get an invite now. Thanks for letting me know.

Mariana

On Fri, Apr 8, 2022 at 1:50 PM Samuel Levis @.***> wrote:

Hi Mariana,

I was working on other things and didn't read this email until now. I haven't received an invite for this afternoon, yet.

Sam

On Fri, Apr 8, 2022 at 8:09 AM mvertens @.***> wrote:

@wwieder https://github.com/wwieder @ekluzek https://github.com/ekluzek @slevisconsulting https://github.com/slevisconsulting - I think a quick meeting to discuss this would be helpful. I've sent an invite for this afternoon.

— Reply to this email directly, view it on GitHub https://github.com/ESCOMP/CTSM/issues/1303#issuecomment-1092973828, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AINPYFEZSE52FG4HFHL47V3VEBD37ANCNFSM4ZKQKIKQ

. You are receiving this because you were mentioned.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/ESCOMP/CTSM/issues/1303#issuecomment-1093300004, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4XCEYMY6WN42NHJAU5UPTVECEX3ANCNFSM4ZKQKIKQ . You are receiving this because you were mentioned.Message ID: @.***>

-- Dr. Mariana Vertenstein CESM Software Engineering Group Head National Center for Atmospheric Research Boulder, Colorado Office 303-497-1349 Email: @.***

slevis-lmwg commented 2 years ago

I did receive it now. Thank you and see you then.

wwieder commented 2 years ago

@mvertens see if this new file works.
/glade/scratch/wwieder/wise_30sec_v1/WISE30sec/Interchangeable_format/wise_30sec_v1_lookup2.nc

I think it should be the right format (or at least the following code worked)

nccopy -k cdf5 /glade/scratch/wwieder/wise_30sec_v1/WISE30sec/Interchangeable_format/wise_30sec_v1_lookup.nc /glade/scratch/wwieder/wise_30sec_v1/WISE30sec/Interchangeable_format/wise_30sec_v1_lookup2.nc

jedwards4b commented 2 years ago

@wwieder Did you try my suggestion above? You should be able to create this file in a format that works and not need the nccopy step.

wwieder commented 2 years ago

sorry, I was on another call. Yes, thank you. This seems to have worked, but I've written out a bunch of files I thought would work! The real test is when Mariana tries to use it :)

On Fri, Apr 8, 2022 at 3:15 PM Jim Edwards @.***> wrote:

@wwieder https://github.com/wwieder Did you try my suggestion above? You should be able to create this file in a format that works and not need the nccopy step.

— Reply to this email directly, view it on GitHub https://github.com/ESCOMP/CTSM/issues/1303#issuecomment-1093366955, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5IWJCWXWHPES3Q4HQOYYTVECOWDANCNFSM4ZKQKIKQ . You are receiving this because you were mentioned.Message ID: @.***>

mvertens commented 2 years ago

@wwieder @dlawren @slevisconsulting @ekluzek - using the new lookup table that @wwieder generated, I now have created a new surface that no longer has zeroes for PCT_SAND and PCT_CLAY. See /glade/work/mvertens/ctsm.new_mksurfdata/tools/mksurfdata_esmf/surfdata_1.9x2.5_hist_78pfts_CMIP6_2000_c220407.nc. The assumptions I made were the following:

  1. Determine the top soil layer sand_o and clay_o. If it is less than 0 search within the SCID array for the first index that gives a pct_sand or pct_clay that is still less than 0, then if the value is -4 set pct_sand=99 and pct_clay=1 otherwise set pct_sand=43 and pct_clay=18.
  2. Then go through the entire column. If any value is less than 0 simply set the rest of the column to the first non-zero value encountered.

If you look at the file /glade/work/mvertens/ctsm.new_mksurfdata/tools/mksurfdata_esmf/mksurfdata.o3759518 you will see the output for all of the values there were changed according to 1. above.

I think its okay now to use this surface dataset in a simulation. What I am uncertain about is how to proceed from here. Will this new soil texture raw dataset always be used from now on. I have introduced a new routine mksoiltexnewMod.F90 that uses this new dataset as well as keeping the original code mksoiltexMod.F90 around. Do we want to keep the ability to still have both - or simply just move to the new dataset if everything looks reasonable. I am also on a branch right now - so it would be good to determine how to move forwards with the integrating this into the the PR we will be making for the new surface dataset generation code.

wwieder commented 2 years ago

This is great, thanks @mvertens! I think this will be replacing our older data on soil texture, but first we want to test an SP simulation with this new surface dataset from the older one we've been using. @olyson are you able to do this? Assuming results are reasonable I don't think we need to keep the old dataset and code around, but what do others think?

Note, we'll also want to test out the new ORGANICS data we can also ingest from the WISE data. Should we get started on this, or wait to see how the texture data works out?

dlawrenncar commented 2 years ago

I'd agree that we don't need to maintain backwards compatibility if the results with updated data look reasonable. As before, I suggest we wait to tackle the organics stuff until after we have tested the soil texture data. Hopefully it looks good and we can start considering organics next week.

On Wed, Apr 13, 2022 at 1:25 PM will wieder @.***> wrote:

This is great, thanks @mvertens https://github.com/mvertens! I think this will be replacing our older data on soil texture, but first we want to test an SP simulation with this new surface dataset from the older one we've been using. @olyson https://github.com/olyson are you able to do this? Assuming results are reasonable I don't think we need to keep the old dataset and code around, but what do others think?

Note, we'll also want to test out the new ORGANICS data we can also ingest from the WISE data. Should we get started on this, or wait to see how the texture data works out?

— Reply to this email directly, view it on GitHub https://github.com/ESCOMP/CTSM/issues/1303#issuecomment-1098402709, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFABYVBFFEDQIG77FG3S363VE4NTFANCNFSM4ZKQKIKQ . You are receiving this because you were mentioned.Message ID: @.***>

olyson commented 2 years ago

A 2000 SP simulation ran successfully for 10 years with ctsm5.1.dev090. I'm not quite sure what to compare it to. Seems like we would need a surface dataset generated from the same mksurfdata_esmf code, but using the old raw soil texture dataset and then do another 10 year simulation with that.

slevis-lmwg commented 2 years ago

Keith, I will go ahead and generate the file and send it, unless Mariana sends you one sooner...

Sam

On Wed, Apr 13, 2022 at 3:42 PM Keith Oleson @.***> wrote:

A 2000 SP simulation ran successfully for 10 years with ctsm5.1.dev090. I'm not quite sure what to compare it to. Seems like we would need a surface dataset generated from the same mksurfdata_esmf code, but using the old raw soil texture dataset and then do another 10 year simulation with that.

slevis-lmwg commented 2 years ago

Keith, please try this file: /glade/work/slevis/git/mksurfdata_toolchain/tools/mksurfdata_esmf/surfdata_1.9x2.5_hist_78pfts_CMIP6_2000_c220413.nc

olyson commented 2 years ago

Thanks @slevisconsulting (and thanks @mvertens for providing the new soil texture file)!

olyson commented 2 years ago

@slevisconsulting , I got an error with the file you provided:

346: RSUB_TOP IS POSITIVE in Drainage!ERROR in SoilHydrologyMod.F90 at line 2069

One thing I see is that the STD_ELEV field is problematic (there are some negative values and some very large values....)

slevis-lmwg commented 2 years ago

Thanks @olyson I will look into it.

slevis-lmwg commented 2 years ago

Looks better now. Same file path/name.

One of the code changes that I happened to be trying when I created the previous copy was a failure :-)
Sorry about that.

mvertens commented 2 years ago

@slevisconsulting - thanks for generating this dataset for @olyson.

olyson commented 2 years ago

Thanks @slevisconsulting , that file worked. There is a 5-year climo comparison (10 year trends) here:

https://webext.cgd.ucar.edu/I2000/ctsm51sp_ctsm51d090_2deg_GSWP3V1_soiltex_hist/lnd/ctsm51sp_ctsm51d090_2deg_GSWP3V1_soiltex_hist.2005_2009-ctsm51sp_ctsm51d090_2deg_GSWP3V1_hist.2005_2009/setsIndex.html

I haven't looked in detail, but I don't see any major differences or problems.

dlawrenncar commented 2 years ago

I also looked through some plots and diagnostics. I agree that nothing seems unusual or unexpected. There is an impact, but for the most part the impact is modest. This is probably good to go, which means we could turn our attention to the organic matter portion. I have full schedule today and am on PTO tomorrow. Perhaps we could start on this on Monday next week.

On Thu, Apr 14, 2022 at 7:50 AM Keith Oleson @.***> wrote:

Thanks @slevisconsulting https://github.com/slevisconsulting , that file worked. There is a 5-year climo comparison (10 year trends) here:

https://webext.cgd.ucar.edu/I2000/ctsm51sp_ctsm51d090_2deg_GSWP3V1_soiltex_hist/lnd/ctsm51sp_ctsm51d090_2deg_GSWP3V1_soiltex_hist.2005_2009-ctsm51sp_ctsm51d090_2deg_GSWP3V1_hist.2005_2009/setsIndex.html

I haven't looked in detail, but I don't see any major differences or problems.

— Reply to this email directly, view it on GitHub https://github.com/ESCOMP/CTSM/issues/1303#issuecomment-1099208099, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFABYVAY3JTQFORYRKBVFC3VFAPCTANCNFSM4ZKQKIKQ . You are receiving this because you were mentioned.Message ID: @.***>

mvertens commented 2 years ago

@dlawrenncar @olyson - this is really encouraging news. @slevisconsulting - we should meet to discuss how to integrate this new dataset into the main branch for mksurfdata_esmf. I want to emphasize that what I have done is create an intermediate resolution mapunit dataset (soiltex_mapunits_4320x2160_c220329.nc) by mapping the 30 second original mapunit dataset provided by @wwieder to this grid. The original mapping to 2 degrees took over 2 hours - which was not feasible for most of our resolutions. Any model resolution higher than 4320x2160 should use the 30 second dataset should use the original grid and we should put this capability into the namelist generation. @slevisconsulting - it would be good to talk about how we want to proceed in the next day or two.

wwieder commented 2 years ago

Next up: ORGANIC using the same mapping unit information being used for texture.

I'm used to thinking about calculating organic matter stocks (kg C/m2), but the model only cares about organic matter density (kg OM/m3)

Technically, this should just be calculated on the fine earth fraction (1-coarse fragments).

the WISE lookup table has all of this information

Property units long_name
ORGC gC kg^-1 soil organic carbon content
BULK g soil cm^-3 bulk density
CFRAG volumetric, % coarse fragment

Additionally we'll assume 1g OM = 0.58 gC. NOTE I have no idea where this conversion factor is from, but I'm assuming it was used for the old calculation of ORGANIC we've been using?

Thus:

ORGANIC = ORGC*BULK*(100-CFRAG)/100  *1/0.58 

This should provide ORGANIC (kgC m^-3 soil) I think all the units, converting g to kg and cm3 to m2, cancel out.

Here are samples for two different profiles (not sure where they are?) image

@dlawrenncar can you check this matches your expectations (and unit conversions). Also, I think we're more correct to just use the 'fine early fraction' by removing the coarse fragment (rocks), but don't know what you think?

dlawrenncar commented 2 years ago

I think the equation makes sense, but I don't know if the values are reasonable or not. I know that basically we use this number (organic kg OM/m3) to calculate the fraction of organic matter in any soil layer. We use a prescribed value of 130 kg OM/m3 as the maximum organic matter density, based on standard density of peat soils (I think, need to look back at where I got that number from). When creating the original organic dataset, we constrain so that the values cannot be larger than 130 kg OM/m3. Across most of the Arctic, the top several layers are 130 kg OM/m3, reflecting the surface organic soils that are prevalent.

Anyway, with this said, I am starting to think that we may be better off calculating the %ORGANIC (analogous to %SAND, % CLAY) and putting that onto the surface dataset and then using %ORGANIC rather than doing this calculation in the code. (Note that %ORGANIC is considered independently of %SAND, %CLAY and the %SAND, %CLAY values are only used if %ORANIC is not 100). This probably would have been a better way to do this originally, but ... This would require a new piece of code that uses %ORGANIC if it is available on the surface dataset instead of using ORGANIC. Probably we should discuss.

On Tue, Apr 19, 2022 at 6:29 PM will wieder @.***> wrote:

Next up: ORGANIC using the same mapping unit information being used for texture.

I'm used to thinking about calculating organic matter stocks (kg C/m2), but the model only cares about organic matter density (kg OM/m3)

Technically, this should just be calculated on the fine earth fraction (1-coarse fragments).

the WISE lookup table has all of this information Property units long_name ORGC gC kg^-1 soil organic carbon content BULK g soil cm^-3 bulk density CFRAG volumetric, % coarse fragment

Additionally we'll assume 1g OM = 0.58 gC. NOTE I have no idea where this conversion factor is from, but I'm assuming it was used for the old calculation of ORGANIC we've been using?

Thus:

ORGANIC = ORGCBULK(100-CFRAG)/100 *1/0.58

This should provide ORGANIC (kgC m^-3 soil) I think all the units, converting g to kg and cm3 to m2, cancel out.

Here are samples for two different profiles (not sure where they are?) [image: image] https://user-images.githubusercontent.com/8031012/164122451-dd8b67a4-5ba6-48fe-a30e-a2675527044c.png

@dlawrenncar https://github.com/dlawrenncar can you check this matches your expectations (and unit conversions). Also, I think we're more correct to just use the 'fine early fraction' by removing the coarse fragment (rocks), but don't know what you think?

— Reply to this email directly, view it on GitHub https://github.com/ESCOMP/CTSM/issues/1303#issuecomment-1103295279, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFABYVHEVXHBYRYTWMF4VS3VF5FVJANCNFSM4ZKQKIKQ . You are receiving this because you were mentioned.Message ID: @.***>

mvertens commented 2 years ago

@wwieder @slevisconsulting I have the following two datasets in my own sandbox that I'd like to put into inputdata

  1. /glade/u/home/mvertens/src/ctsm.new_mksurfdata/tools/mksurfdata_esmf/soiltex_mapunits_4320x2160_c220329.nc
  2. /glade/u/home/mvertens/src/ctsm.new_mksurfdata/tools/mksurfdata_esmf/wise_30sec_v1_lookup2.nc which points to /glade/scratch/wwieder/wise_30sec_v1/WISE30sec/Interchangeable_format/wise_30sec_v1_lookup2.nc

I'd like to move both of these to inputdata and put them in the xml file so that we are not pointing into datasets in scratch. What should we call these files when I move them?

slevis-lmwg commented 2 years ago

Probably we should discuss.

I'm happy to set up a follow-up meeting, though @wwieder could you share your calendar with me so that I may pick a time that works for all? Or feel free to schedule this next meeting.

wwieder commented 2 years ago

I made a meeting for tomorrow. Hopefully 30 minutes is enough.

before uploading files to input data I also wondered if the metadata on these files is sufficient or if additional information is needed (e.g. the script used to generate the .nc files from the raw data we're getting from WISE)?

ekluzek commented 2 years ago

@wwieder we haven't always saved the scripts that create the raw data files. I do think it's always important to save some metadata that describes what was done though. And if the manipulations that were done were straightforward that someone could recreate the process later, it's not strictly necessary to save the script. In general we don't have to reconstruct these datasets, but for scientific reproducibility the instructions should be good enough that you could create it again. So if you do lots of complex manipulations that you can't describe easily it might be best to archive the script. The only other reason to archive the script is if we think we'll use it again fairly soon. That seems unlikely to me.

But, you've already archived the script in your repo in github, and that's sufficient to me. It doesn't look like you do anything that is unduly complex. The link above already connects the script to this issue, so it'll be straight forward to find it again.

ekluzek commented 2 years ago

In advance of our meeting and in terms of things I find important to have in filenames:

For mksurfdata we've prepended a "mksrf_" in front of our filenames. Since, almost all of the files in that directory have that prefix I don't think that's very helpful. So moving away from that would be reasonable now.

wwieder commented 2 years ago

Sorry for the delay on this, but here are 3 arctic map units, two from AK and one in Norway image Was this one concern you had about the data we're providing.

The other suggesting was that we just include some measure of soil organic carbon content (ORGC) in the WISE dataset. Maybe @mvertens can include both values on the surface dataset?

dlawrenncar commented 2 years ago

Those all look reasonable and it is comforting to see that the are all within the ballpark of the assumed 100% organic matter density of 130 kg OM/m3. But, with the code as it exists now, just using these values won't work, unfortunately, which is why I am recommending that we switch to calculating %ORGANIC content and then using that directly, if it is possible (I'm not sure how to calculate this). Not sure what the other options are. @wwieder This probably requires a chat.

wwieder commented 2 years ago

just taking the field ORGC*0.1 will give the units in %C Are these values you'd expect for organic soils, @dlawrenncar ? image

wwieder commented 2 years ago

@mvertens I hope these two files have the right information. Their global attributes are likely worth bringing over into the files you're archiving.

The lookup table, used for all resolutions /glade/scratch/wwieder/wise_30sec_v1/WISE30sec/Interchangeable_format/wise_30sec_v1_lookup2.nc

and the 30 resolution of the mapUnits /glade/scratch/wwieder/wise_30sec_v1/WISE30sec/Interchangeable_format/wise_30sec_v1_grid2.nc

slevis-lmwg commented 1 year ago

I merged #1732 so closing this issue.