ESCOMP / CTSM

Community Terrestrial Systems Model (includes the Community Land Model of CESM)
http://www.cesm.ucar.edu/models/cesm2.0/land/
Other
296 stars 301 forks source link

Update soils data used for surface dataset #1303

Closed wwieder closed 1 year ago

wwieder commented 3 years ago

It would be nice to update the soils data we're using to generate the surface dataset to something from this century. This will introduce a number of answer changes to the code, but it seems worth having a discussion about what we need here.

@dlawrenncar suggested using SoilGrids data, which just released a version 2.0 of their dataset https://doi.org/10.5194/soil-2020-65. SoilGrids2.0 contains information on soil texture, OC content, pH, bulk density, coarse fragments, CEC, and soil N at 250 m resolution for 6 soil layers (0-200 cm). This high resolution data also includes uncertainty estimates! According to the data providers, v2.0 has changed significantly from previous releases of the dataset, but is currently only available at 250m resolution.

Laura Poggio and Niels Batjes at ISRIC are interested in and willing to provide a coarser resolution data product for our purposes and wondered what we wanted. I've basically told them we'd like the whole dataset, but to prioritize texture and soil C information. Is a 5km data product adequate for NWP applications, but not too unwieldy for climate simulations? Do we need 1km resolution mapping flies?

I also wondered if we should think about how to generate soil properties for the hillslope model? Does this happen in our own tool chain, or could it be generated in the mapping files from ISRIC? This is likely of secondary concern, but may be worth discussion?

wwieder commented 3 years ago

ISRIC is suggesting they produce a 1 km and 5 km SoilGrids product as (web optimized) geotiff format. Is this something we can use in the toolchain @slevisconsulting and @negin513? How hard should I push for a 3 arc minute product instead and a .nc format?

billsacks commented 3 years ago

I'm pretty sure we're going to want netcdf for our tool-chain: even if we could read geotiff directly, I'm not sure it's a good idea to have different raw data in different formats: I think that's going to cause pain long-term. That said, I don't have feelings on whether we ask them to produce netcdf or if we convert their geotiff file to netcdf as an initial one-time thing.

Regarding resolution: First, I realized that our existing 1km file may not actually be uniform 1km: looking at the file name and metadata, I'm remembering that @swensosc merged 10' data from some regions with 1km data from most of the globe; my sense (maybe wrong) is that the resulting dataset is therefore an unstructured mix of resolutions. Regarding 5km vs. 3 arc-minute: Maybe we need to discuss as a group how much to push for conformity to a few standard resolutions vs. accepting whatever we get. I suspect that, if we use 5km, it will be the only dataset on this exact grid, somewhat increasing the time it takes to go through the toolchain – though probably not too terribly for 5km (as opposed to 1km, which is worse).

dlawrenncar commented 3 years ago

Yes. We can discuss about uniformity, but my guess is that the reality is that uniformity of resolutions is going to be challenging going forward. So, I would probably rather not to put the burden on data providers and if we really need something on a specific grid, we can do a one-time regridding to that specific grid when we get the data.

On Fri, Mar 26, 2021 at 10:30 AM Bill Sacks @.***> wrote:

I'm pretty sure we're going to want netcdf for our tool-chain: even if we could read geotiff directly, I'm not sure it's a good idea to have different raw data in different formats: I think that's going to cause pain long-term. That said, I don't have feelings on whether we ask them to produce netcdf or if we convert their geotiff file to netcdf as an initial one-time thing.

Regarding resolution: First, I realized that our existing 1km file may not actually be uniform 1km: looking at the file name and metadata, I'm remembering that @swensosc https://github.com/swensosc merged 10' data from some regions with 1km data from most of the globe; my sense (maybe wrong) is that the resulting dataset is therefore an unstructured mix of resolutions. Regarding 5km vs. 3 arc-minute: Maybe we need to discuss as a group how much to push for conformity to a few standard resolutions vs. accepting whatever we get. I suspect that, if we use 5km, it will be the only dataset on this exact grid, somewhat increasing the time it takes to go through the toolchain – though probably not too terribly for 5km (as opposed to 1km, which is worse).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ESCOMP/CTSM/issues/1303#issuecomment-808355323, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFABYVDXDUAWRKU675AGWOLTFSZDFANCNFSM4ZKQKIKQ .

slevis-lmwg commented 3 years ago

And with the long term in mind, it's probably best to accept the highest resolution that they have to offer. Then, as @billsacks and @dlawrenncar said, we can spend the time once to get the data in the exact form that we can work with.

wwieder commented 3 years ago

While I agree, Sam, they have a 250 m product that's published and ready to go. This doesn't seem like where we want to start...

On Sat, Mar 27, 2021, 12:57 PM Samuel Levis @.***> wrote:

And with the long term in mind, it's probably best to accept the highest resolution that they have to offer. Then, as @billsacks https://github.com/billsacks and @dlawrenncar https://github.com/dlawrenncar said, we can spend the time once to get the data in the exact form that we can work with.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ESCOMP/CTSM/issues/1303#issuecomment-808786637, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5IWJFUBTZLC6AVCIB3EZ3TFYTADANCNFSM4ZKQKIKQ .

wwieder commented 2 years ago

New 1km and 5km resolution product are now available from SoilGrids.

you can find the data here: https://files.isric.org/soilgrids/latest/data_aggregated/ The metadata (including the DOI for citations) can be found here: https://data.isric.org/

The data producers have asked for input on these data products, which I am happy to provide. What should be our workflow to start testing these data in new surface datasets?

dlawrenncar commented 2 years ago

Is the data in a format that it could be used directly by mksrfdata? If it is, then I think a straightforward test of the SoilGrids vs the existing data where only soil texture is changed, would be the next step. Perhaps good topic for discussion at next software meeting.

On Tue, Feb 8, 2022 at 6:00 AM will wieder @.***> wrote:

New 1km and 5km resolution product are now available from SoilGrids.

you can find the data here: https://files.isric.org/soilgrids/latest/data_aggregated/ The metadata (including the DOI for citations) can be found here: https://data.isric.org/

The data producers have asked for input on these data products, which I am happy to provide. What should be our workflow to start testing these data in new surface datasets?

— Reply to this email directly, view it on GitHub https://github.com/ESCOMP/CTSM/issues/1303#issuecomment-1032582398, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFABYVCZJOBVMPRESKEFUOLU2EHX5ANCNFSM4ZKQKIKQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

wwieder commented 2 years ago

files are in geotiff format.

I'm assuming we'll need to merge datasets into a single .nc file first.

ekluzek commented 2 years ago

Yes, we'll need to convert to NetCDF, and make sure we have the fields needed by mksurfdata_map on them. So @wwieder are there several data files for different global regions? If so as you suggest we'd need to merge them to one global dataset. All of the datasets are in one global file.

wwieder commented 2 years ago

The variables of interest includes data on clay, sand and soil organic C that we need now, but also data on soil N, pH, CEC, bulk density, etc. that may be useful down the road? I'm somewhat inclined to include more fields than we need in generating our 'raw' dataset.

Each variable has 6 tiff files that are provided (one for each soil layer 0-5, 5-15,... 100-200 cm). These should be concatenated with a depth coordinate.

We'll just have to maintain the metadata, or adjust units as appropriate, because my recollection is that units, especially for soil C are kind of odd.

Translating the .tif files into .nc seems pretty trivial. https://nsidc.org/support/faq/how-can-i-convert-geotiff-netcdf

wwieder commented 2 years ago

This isn't a finished product, as I need to bring in metadata somehow (it's listed elsewhere on the soilgrids website), and a bunch of other detailed things, but here's my first attempt at converting a geotiff into a .nc projection for sand that seems reasonable /glade/scratch/wwieder/SoilGrids/ncMerged/sand_0-300_mean_5000.nc

This projection is not wall to wall (lat != -90 to 90). Does this matter for mksrf? What other considerations need to be made?

ekluzek commented 2 years ago

Looks good to see! In principle I think it's OK for mksurfdata_map, that it doesn't cover the entire globe, the mapping will be done for the part of the grid that it does cover. I thought it might be a problem that it doesn't cover Antarctica, but neither does the current file we use, so I guess that's OK.

Another thing that will need to be done is to create a SCRIP grid file that describes the grid and its vertices for each gridcell. This just has the center grid coordinates. Since, it's almost exactly a regular grid, we can calculate the vertices.

wwieder commented 2 years ago

OK, here's a full 5000m dataset with soil properties from SoilGrids.
/glade/scratch/wwieder/SoilGrids/ncMerged/SoilGrids_mean_5000_merged.nc

We can add additional metadata and talk about where to put my notebook that generated these plots.
It may be worth discussing implementation of certain fields as we generate surface datasets, but hopefully this is enough to get us started.

wwieder commented 2 years ago

Notebook with code can be found here https://github.com/wwieder/ctsm_py/blob/master/notebooks/tiff2nc.ipynb

wwieder commented 2 years ago

Sorry, I'm still struggling to understand what's needed here?

There are a bunch of ways to reproject the orig. tiff data, see this website, but I can't really find anything that would be better that what's already provided?

Moreover, the spacing for lon seems pretty regular, and lats are identical. Below are longitude spacing.

-0.04551960876054295

-0.04551960876057137

from here, can't we calculate the corners of each grid?

wwieder commented 2 years ago

@swensosc can you have a look at the dataset below to see what we can do to calculate the corners of each gridcell in a way that can be read into mksurfdata_map?

/glade/scratch/wwieder/SoilGrids/ncMerged/SoilGrids_mean_5000_merged.nc

wwieder commented 2 years ago

@uturuncoglu, @mvertens mentioned that you have a tool that generates a mesh file from a raw dataset.
(sorry @kauff, this was supposed to go to Ufuk.)

I'm wondering if the dataset below has the information for what your script needs? /glade/scratch/wwieder/SoilGrids/ncMerged/SoilGrids_mean_5000_merged.nc

mvertens commented 2 years ago

@uturuncoglu - I was referring to the ncl/python code you have to take a lat/lon grid (or logically rectangular grid) and create a mesh file.

mvertens commented 2 years ago

@uturuncoglu - it would be great to make this available to the TSS group - even if its not totally finished.

uturuncoglu commented 2 years ago

Hi All,

The Python tool is in my personal Gist repository. You could find it in here,

https://gist.github.com/uturuncoglu/4fdf7d4253b250dcf3cad2335651f162

The NCL one is in,

https://gist.github.com/uturuncoglu/1da852ffe2e0247aa4bb0caf2e79df7a

BTW, just note that those are not working for the all the cases and let me know if you need anything.

uturuncoglu commented 2 years ago

We could try the tools with /glade/scratch/wwieder/SoilGrids/ncMerged/SoilGrids_mean_5000_merged.nc to see what happens. @wwieder Do you want to try yourself or I could try for you.

wwieder commented 2 years ago

Thanks! If you can point your script the the file I provided to see if it work that would be great @uturuncoglu! Thanks

uturuncoglu commented 2 years ago

@wwieder JFYI, I tried with both tool. The Python one complains about the coordinate pairs like The size of unique coordinate pairs is 32633498 but expected size is 24106632!. I am not sure but it is not workin in this resolution of the grid. Anyway, I also tried with NCL way (create scrip definition file and use ESMF offline tool to generate mesh). The ESMF offline tool dies. I am not sure what since there is no informative message in the output. It might be due to the memory consumption. I'll try couple of other things and get back to you.

wwieder commented 2 years ago

Thanks for looking into this @uturuncoglu. if the 'standard' tools I'm using for this aren't producing a dataset you can easily work with, I wonder if there's something that I'm not doing correctly on my end in re-projecting this dataset from the original TIFF file? These are python libraries and dataset questions I'm not very familiar with.

wwieder commented 2 years ago

@uturuncoglu, @mvertens what are next steps on this? Is there potentially just a memory issue with this high resolution data set?

mvertens commented 2 years ago

@wwieder - If there was not access to @uturuncoglu new utility - how would you have created a SCRIP file. You would have needed that in any case for creating the necessary mapping files in the current mksurfdata_map generation utility. Maybe a quick meeting next week when I get back from PTO would help clarify the next steps.

uturuncoglu commented 2 years ago

@wwieder I am still stuck to create mesh file. The only thing that I could try to use the fat nodes to use more memory to see what happens.

wwieder commented 2 years ago

OK, if the fat nodes don't work @uturuncoglu we can regroup next week to make a plan.

negin513 commented 2 years ago

@wwieder I have an ESMF problem that was complaining about mesh files not created correctly:

20220222 110147.329 ERROR            PET63 /glade/p/cesmdata/cseg/PROGS/build/19294/esmf-8.1.0b23/src/Infrastructure/Mesh/src/ESMCI_Mesh_Glue.C:5551 ESMCI_meshcreateredistelems() Internal error: Bad condition  - /glade/p/cesmdata/cseg/PROGS/build/19294/esmf-8.1.0b23/src/Infrastructure/Mesh/src/Legacy/ESMCI_DDir.C, line:251:P:63 could not service request for gid=8911
20220222 110147.330 ERROR            PET63 ESMCI_MeshCap.C:1537 MeshCap::meshcreateredistelems() Internal error: Bad condition  - Internal subroutine call returned Error
20220222 110147.330 ERROR            PET63 ESMF_Mesh.F90:3477 ESMF_MeshCreateRedist() Internal error: Bad condition  - Internal subroutine call returned Error
20220222 110147.331 ERROR            PET63 ESMF_Mesh.F90:2089 ESMF_MeshCreateFromFile() Internal error: Bad condition  - Internal subroutine call returned Error
20220222 110147.331 ERROR            PET63 lnd_comp_esmf.F90:385 Internal error: Bad condition  - Passing error in return code

This was resolved when I increased the number of nodes dramatically.

This might be completely unrelated to your issue but I thought it might help.

wwieder commented 2 years ago

for what it's worth I was able to regrid the 5km dataset to the CLM 1 degree grid using xesmf in python. I grabbed 200 GB of memory on casper to create the weight file bilinear_3047x7908_192x288_peri.nc, but it never need more that 70 GB from what I could see. Is this process at all similar to what your scrip does, @uturuncoglu ?

uturuncoglu commented 2 years ago

@wwieder Currently testing. I'll let you soon.

uturuncoglu commented 2 years ago

@wwieder I could able to create mesh file by using 36 bigmem node with single MPI process each. You could find the mesh file in the following path on Cheyenne. Please try to use it and let me know how it goes.

/glade/work/turuncu/HOME/UFS/Streams/SoilGrids_mean_5000_merged.ESMFmesh.170222.nc

wwieder commented 2 years ago

@uturuncoglu this is awesome. Thanks.

@ekluzek can you have a look to see if this is what we'll need to bring in a new soils dataset to mksurf?

wwieder commented 2 years ago

at the SE meeting we decided we have two challenges one technical and one scientific.

On the technical side we want to see if the merged 'raw data' and @uturuncoglu 's mesh file are compatible with @mvertens new online regridding tools. Relevant files are listed below.

@olyson is going to try and modify @mvertens tools to bring in this new sand and clay data. The new raw data no longer have mapping units used in the old soil dataset, so initially we can use area conservative weighting, as used for other input data.

@dlawrenncar and I will reach out to other groups for suggestions about how to handle the best way to regrid these datasets that no longer have mapping units that we can use to identify the 'dominant' soil type in a gridcell.

olyson commented 2 years ago

I've been working on this a bit. However, the dataset has it's own vertical grid (6 layers) and it doesn't look like the mksurfdata_esmf routines do any kind of vertical interpolation. It assumes that the data is on the nlevsoi=10 vertical grid. E.g., both the original soil texture and organic matter datasets are on the nlevsoi vertical grid. So, some work needs to be done on the dataset (/glade/scratch/wwieder/SoilGrids/ncMerged/SoilGrids_mean_5000_merged.nc) first.

mvertens commented 2 years ago

@olyson - let's talk about this next week. We should be able to add vertical interpolation if that is a requirement. I think it would be good to have the mksurfdata_esmf address future requirements and if vertical interpolation is one of those - it should be able to do it.

wwieder commented 2 years ago

This is going to turn into a full fledged research project. There could be some ways to handle this in the modify_surfdat work that @negin513 did for the NEON project. This would avoid vertical interpolation (which again may not be realistic with soils). It should likely be evaluated more carefully, however?

@olyson for the purposes of this initial 'does it work' stage in the project. Can you see if you can regrid a single layer of sand-silt-clay from the 5km product to a standard CESM resolution without worrying about the vertical component, or will this take even more code modifications?

olyson commented 2 years ago

For now, I can probably just create a new input file that has a rough mapping from the 6 layers to the nlevsoi grid and see if the regridding works.

dlawrenncar commented 2 years ago

It may be that we would want to do the mapping from 6 layers to 10 layers rather than interpolating for soils. This is what happens in the CLM code itself if there are not exactly 10 soil layers.

On Sun, Mar 6, 2022 at 9:10 AM Keith Oleson @.***> wrote:

For now, I can probably just create a new input file that has a rough mapping from the 6 layers to the nlevsoi grid and see if the regridding works.

— Reply to this email directly, view it on GitHub https://github.com/ESCOMP/CTSM/issues/1303#issuecomment-1059990694, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFABYVAUKD4TYNFCLSMTWM3U6TKIZANCNFSM4ZKQKIKQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

mvertens commented 2 years ago

@olyson @wwieder @dlawrenncar - Can I please meet with you this week to go over what the new surface dataset mapping code does. Its very easy to map a whole set of vertical layers in the horizontal using ungridded dimensions. I have not incorporated that capability in the new mksurfdata_esmf - but it would be fairly straightforward to do. That functionality is already in CDEPS and is used to map the multiple soil layer data forcing. What was never implemented was any time of vertical interpolation. But it would be very helpful to clarify what can be done easily now so that a path forward is chosen in the most well informed manner.

olyson commented 2 years ago

I did a rough nearest neighbor mapping of the original 6 layer vertical grid to the 10 layer vertical grid. I was able to use that new file plus my modifications to mksoiltexMod.F90 to generate a year 2000 2deg surface dataset. We can fine tune the mapping.

mvertens commented 2 years ago

@olyson - that's great to hear. Did you have any problems in building and running the new code? To clarify - we are mapping a whole set of vertical layers at once in mksoiltexMod.F90 - but not doing any vertical interpolation yet.

wwieder commented 2 years ago

Does it make sense to start discussing this as a smaller group some time before our Thursday CLM meeting?
If so, I may suggest Thurs at 9, in advance of the CTSM-SE meeting?

Sorry to miss the call last Friday.

mvertens commented 2 years ago

@wwieder - that sounds good to me. I'm happy to join at 9.

olyson commented 2 years ago

I'm available.

wwieder commented 2 years ago

Maybe the 3 of us can just join the SE meeting early? I think @dlawrenncar is in another meeting. Should others join?

mvertens commented 2 years ago

@wwieder - having the 3 of us just join the SE meeting early sounds good. I'm fine with that.

ekluzek commented 2 years ago

I'd like to join as well. I've done a lot of work on mksurfdata in the past I think it's good for me to remain in the loop about the new version.

wwieder commented 2 years ago

is 9 on thursday OK @ekluzek ?

ekluzek commented 2 years ago

Yes, that's fine.