Closed wwieder closed 1 year ago
ISRIC is suggesting they produce a 1 km and 5 km SoilGrids product as (web optimized) geotiff format. Is this something we can use in the toolchain @slevisconsulting and @negin513? How hard should I push for a 3 arc minute product instead and a .nc format?
I'm pretty sure we're going to want netcdf for our tool-chain: even if we could read geotiff directly, I'm not sure it's a good idea to have different raw data in different formats: I think that's going to cause pain long-term. That said, I don't have feelings on whether we ask them to produce netcdf or if we convert their geotiff file to netcdf as an initial one-time thing.
Regarding resolution: First, I realized that our existing 1km file may not actually be uniform 1km: looking at the file name and metadata, I'm remembering that @swensosc merged 10' data from some regions with 1km data from most of the globe; my sense (maybe wrong) is that the resulting dataset is therefore an unstructured mix of resolutions. Regarding 5km vs. 3 arc-minute: Maybe we need to discuss as a group how much to push for conformity to a few standard resolutions vs. accepting whatever we get. I suspect that, if we use 5km, it will be the only dataset on this exact grid, somewhat increasing the time it takes to go through the toolchain – though probably not too terribly for 5km (as opposed to 1km, which is worse).
Yes. We can discuss about uniformity, but my guess is that the reality is that uniformity of resolutions is going to be challenging going forward. So, I would probably rather not to put the burden on data providers and if we really need something on a specific grid, we can do a one-time regridding to that specific grid when we get the data.
On Fri, Mar 26, 2021 at 10:30 AM Bill Sacks @.***> wrote:
I'm pretty sure we're going to want netcdf for our tool-chain: even if we could read geotiff directly, I'm not sure it's a good idea to have different raw data in different formats: I think that's going to cause pain long-term. That said, I don't have feelings on whether we ask them to produce netcdf or if we convert their geotiff file to netcdf as an initial one-time thing.
Regarding resolution: First, I realized that our existing 1km file may not actually be uniform 1km: looking at the file name and metadata, I'm remembering that @swensosc https://github.com/swensosc merged 10' data from some regions with 1km data from most of the globe; my sense (maybe wrong) is that the resulting dataset is therefore an unstructured mix of resolutions. Regarding 5km vs. 3 arc-minute: Maybe we need to discuss as a group how much to push for conformity to a few standard resolutions vs. accepting whatever we get. I suspect that, if we use 5km, it will be the only dataset on this exact grid, somewhat increasing the time it takes to go through the toolchain – though probably not too terribly for 5km (as opposed to 1km, which is worse).
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ESCOMP/CTSM/issues/1303#issuecomment-808355323, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFABYVDXDUAWRKU675AGWOLTFSZDFANCNFSM4ZKQKIKQ .
And with the long term in mind, it's probably best to accept the highest resolution that they have to offer. Then, as @billsacks and @dlawrenncar said, we can spend the time once to get the data in the exact form that we can work with.
While I agree, Sam, they have a 250 m product that's published and ready to go. This doesn't seem like where we want to start...
On Sat, Mar 27, 2021, 12:57 PM Samuel Levis @.***> wrote:
And with the long term in mind, it's probably best to accept the highest resolution that they have to offer. Then, as @billsacks https://github.com/billsacks and @dlawrenncar https://github.com/dlawrenncar said, we can spend the time once to get the data in the exact form that we can work with.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ESCOMP/CTSM/issues/1303#issuecomment-808786637, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5IWJFUBTZLC6AVCIB3EZ3TFYTADANCNFSM4ZKQKIKQ .
New 1km and 5km resolution product are now available from SoilGrids.
you can find the data here: https://files.isric.org/soilgrids/latest/data_aggregated/ The metadata (including the DOI for citations) can be found here: https://data.isric.org/
The data producers have asked for input on these data products, which I am happy to provide. What should be our workflow to start testing these data in new surface datasets?
Is the data in a format that it could be used directly by mksrfdata? If it is, then I think a straightforward test of the SoilGrids vs the existing data where only soil texture is changed, would be the next step. Perhaps good topic for discussion at next software meeting.
On Tue, Feb 8, 2022 at 6:00 AM will wieder @.***> wrote:
New 1km and 5km resolution product are now available from SoilGrids.
you can find the data here: https://files.isric.org/soilgrids/latest/data_aggregated/ The metadata (including the DOI for citations) can be found here: https://data.isric.org/
The data producers have asked for input on these data products, which I am happy to provide. What should be our workflow to start testing these data in new surface datasets?
— Reply to this email directly, view it on GitHub https://github.com/ESCOMP/CTSM/issues/1303#issuecomment-1032582398, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFABYVCZJOBVMPRESKEFUOLU2EHX5ANCNFSM4ZKQKIKQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you were mentioned.Message ID: @.***>
files are in geotiff format.
I'm assuming we'll need to merge datasets into a single .nc file first.
Yes, we'll need to convert to NetCDF, and make sure we have the fields needed by mksurfdata_map on them. So @wwieder are there several data files for different global regions? If so as you suggest we'd need to merge them to one global dataset. All of the datasets are in one global file.
The variables of interest includes data on clay, sand and soil organic C that we need now, but also data on soil N, pH, CEC, bulk density, etc. that may be useful down the road? I'm somewhat inclined to include more fields than we need in generating our 'raw' dataset.
Each variable has 6 tiff files that are provided (one for each soil layer 0-5, 5-15,... 100-200 cm). These should be concatenated with a depth coordinate.
We'll just have to maintain the metadata, or adjust units as appropriate, because my recollection is that units, especially for soil C are kind of odd.
Translating the .tif files into .nc seems pretty trivial. https://nsidc.org/support/faq/how-can-i-convert-geotiff-netcdf
This isn't a finished product, as I need to bring in metadata somehow (it's listed elsewhere on the soilgrids website), and a bunch of other detailed things, but here's my first attempt at converting a geotiff into a .nc projection for sand that seems reasonable /glade/scratch/wwieder/SoilGrids/ncMerged/sand_0-300_mean_5000.nc
This projection is not wall to wall (lat != -90 to 90). Does this matter for mksrf? What other considerations need to be made?
Looks good to see! In principle I think it's OK for mksurfdata_map, that it doesn't cover the entire globe, the mapping will be done for the part of the grid that it does cover. I thought it might be a problem that it doesn't cover Antarctica, but neither does the current file we use, so I guess that's OK.
Another thing that will need to be done is to create a SCRIP grid file that describes the grid and its vertices for each gridcell. This just has the center grid coordinates. Since, it's almost exactly a regular grid, we can calculate the vertices.
OK, here's a full 5000m dataset with soil properties from SoilGrids.
/glade/scratch/wwieder/SoilGrids/ncMerged/SoilGrids_mean_5000_merged.nc
We can add additional metadata and talk about where to put my notebook that generated these plots.
It may be worth discussing implementation of certain fields as we generate surface datasets, but hopefully this is enough to get us started.
Notebook with code can be found here https://github.com/wwieder/ctsm_py/blob/master/notebooks/tiff2nc.ipynb
Sorry, I'm still struggling to understand what's needed here?
There are a bunch of ways to reproject the orig. tiff data, see this website, but I can't really find anything that would be better that what's already provided?
Moreover, the spacing for lon seems pretty regular, and lats are identical. Below are longitude spacing.
-0.04551960876054295
-0.04551960876057137
from here, can't we calculate the corners of each grid?
@swensosc can you have a look at the dataset below to see what we can do to calculate the corners of each gridcell in a way that can be read into mksurfdata_map?
/glade/scratch/wwieder/SoilGrids/ncMerged/SoilGrids_mean_5000_merged.nc
@uturuncoglu, @mvertens mentioned that you have a tool that generates a mesh file from a raw dataset.
(sorry @kauff, this was supposed to go to Ufuk.)
I'm wondering if the dataset below has the information for what your script needs? /glade/scratch/wwieder/SoilGrids/ncMerged/SoilGrids_mean_5000_merged.nc
@uturuncoglu - I was referring to the ncl/python code you have to take a lat/lon grid (or logically rectangular grid) and create a mesh file.
@uturuncoglu - it would be great to make this available to the TSS group - even if its not totally finished.
Hi All,
The Python tool is in my personal Gist repository. You could find it in here,
https://gist.github.com/uturuncoglu/4fdf7d4253b250dcf3cad2335651f162
The NCL one is in,
https://gist.github.com/uturuncoglu/1da852ffe2e0247aa4bb0caf2e79df7a
BTW, just note that those are not working for the all the cases and let me know if you need anything.
We could try the tools with /glade/scratch/wwieder/SoilGrids/ncMerged/SoilGrids_mean_5000_merged.nc
to see what happens. @wwieder Do you want to try yourself or I could try for you.
Thanks! If you can point your script the the file I provided to see if it work that would be great @uturuncoglu! Thanks
@wwieder JFYI, I tried with both tool. The Python one complains about the coordinate pairs like The size of unique coordinate pairs is 32633498 but expected size is 24106632!
. I am not sure but it is not workin in this resolution of the grid. Anyway, I also tried with NCL way (create scrip definition file and use ESMF offline tool to generate mesh). The ESMF offline tool dies. I am not sure what since there is no informative message in the output. It might be due to the memory consumption. I'll try couple of other things and get back to you.
Thanks for looking into this @uturuncoglu. if the 'standard' tools I'm using for this aren't producing a dataset you can easily work with, I wonder if there's something that I'm not doing correctly on my end in re-projecting this dataset from the original TIFF file? These are python libraries and dataset questions I'm not very familiar with.
@uturuncoglu, @mvertens what are next steps on this? Is there potentially just a memory issue with this high resolution data set?
@wwieder - If there was not access to @uturuncoglu new utility - how would you have created a SCRIP file. You would have needed that in any case for creating the necessary mapping files in the current mksurfdata_map generation utility. Maybe a quick meeting next week when I get back from PTO would help clarify the next steps.
@wwieder I am still stuck to create mesh file. The only thing that I could try to use the fat nodes to use more memory to see what happens.
OK, if the fat nodes don't work @uturuncoglu we can regroup next week to make a plan.
@wwieder I have an ESMF problem that was complaining about mesh files not created correctly:
20220222 110147.329 ERROR PET63 /glade/p/cesmdata/cseg/PROGS/build/19294/esmf-8.1.0b23/src/Infrastructure/Mesh/src/ESMCI_Mesh_Glue.C:5551 ESMCI_meshcreateredistelems() Internal error: Bad condition - /glade/p/cesmdata/cseg/PROGS/build/19294/esmf-8.1.0b23/src/Infrastructure/Mesh/src/Legacy/ESMCI_DDir.C, line:251:P:63 could not service request for gid=8911
20220222 110147.330 ERROR PET63 ESMCI_MeshCap.C:1537 MeshCap::meshcreateredistelems() Internal error: Bad condition - Internal subroutine call returned Error
20220222 110147.330 ERROR PET63 ESMF_Mesh.F90:3477 ESMF_MeshCreateRedist() Internal error: Bad condition - Internal subroutine call returned Error
20220222 110147.331 ERROR PET63 ESMF_Mesh.F90:2089 ESMF_MeshCreateFromFile() Internal error: Bad condition - Internal subroutine call returned Error
20220222 110147.331 ERROR PET63 lnd_comp_esmf.F90:385 Internal error: Bad condition - Passing error in return code
This was resolved when I increased the number of nodes dramatically.
This might be completely unrelated to your issue but I thought it might help.
for what it's worth I was able to regrid the 5km dataset to the CLM 1 degree grid using xesmf in python. I grabbed 200 GB of memory on casper to create the weight file bilinear_3047x7908_192x288_peri.nc
, but it never need more that 70 GB from what I could see. Is this process at all similar to what your scrip does, @uturuncoglu ?
@wwieder Currently testing. I'll let you soon.
@wwieder I could able to create mesh file by using 36 bigmem node with single MPI process each. You could find the mesh file in the following path on Cheyenne. Please try to use it and let me know how it goes.
/glade/work/turuncu/HOME/UFS/Streams/SoilGrids_mean_5000_merged.ESMFmesh.170222.nc
@uturuncoglu this is awesome. Thanks.
@ekluzek can you have a look to see if this is what we'll need to bring in a new soils dataset to mksurf?
at the SE meeting we decided we have two challenges one technical and one scientific.
On the technical side we want to see if the merged 'raw data' and @uturuncoglu 's mesh file are compatible with @mvertens new online regridding tools. Relevant files are listed below.
/glade/scratch/wwieder/SoilGrids/ncMerged/SoilGrids_mean_5000_merged.nc
/glade/work/turuncu/HOME/UFS/Streams/SoilGrids_mean_5000_merged.ESMFmesh.170222.nc
@olyson is going to try and modify @mvertens tools to bring in this new sand and clay data. The new raw data no longer have mapping units used in the old soil dataset, so initially we can use area conservative weighting, as used for other input data.
@dlawrenncar and I will reach out to other groups for suggestions about how to handle the best way to regrid these datasets that no longer have mapping units that we can use to identify the 'dominant' soil type in a gridcell.
I've been working on this a bit. However, the dataset has it's own vertical grid (6 layers) and it doesn't look like the mksurfdata_esmf routines do any kind of vertical interpolation. It assumes that the data is on the nlevsoi=10 vertical grid. E.g., both the original soil texture and organic matter datasets are on the nlevsoi vertical grid. So, some work needs to be done on the dataset (/glade/scratch/wwieder/SoilGrids/ncMerged/SoilGrids_mean_5000_merged.nc) first.
@olyson - let's talk about this next week. We should be able to add vertical interpolation if that is a requirement. I think it would be good to have the mksurfdata_esmf address future requirements and if vertical interpolation is one of those - it should be able to do it.
This is going to turn into a full fledged research project. There could be some ways to handle this in the modify_surfdat work that @negin513 did for the NEON project. This would avoid vertical interpolation (which again may not be realistic with soils). It should likely be evaluated more carefully, however?
@olyson for the purposes of this initial 'does it work' stage in the project. Can you see if you can regrid a single layer of sand-silt-clay from the 5km product to a standard CESM resolution without worrying about the vertical component, or will this take even more code modifications?
For now, I can probably just create a new input file that has a rough mapping from the 6 layers to the nlevsoi grid and see if the regridding works.
It may be that we would want to do the mapping from 6 layers to 10 layers rather than interpolating for soils. This is what happens in the CLM code itself if there are not exactly 10 soil layers.
On Sun, Mar 6, 2022 at 9:10 AM Keith Oleson @.***> wrote:
For now, I can probably just create a new input file that has a rough mapping from the 6 layers to the nlevsoi grid and see if the regridding works.
— Reply to this email directly, view it on GitHub https://github.com/ESCOMP/CTSM/issues/1303#issuecomment-1059990694, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFABYVAUKD4TYNFCLSMTWM3U6TKIZANCNFSM4ZKQKIKQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you were mentioned.Message ID: @.***>
@olyson @wwieder @dlawrenncar - Can I please meet with you this week to go over what the new surface dataset mapping code does. Its very easy to map a whole set of vertical layers in the horizontal using ungridded dimensions. I have not incorporated that capability in the new mksurfdata_esmf - but it would be fairly straightforward to do. That functionality is already in CDEPS and is used to map the multiple soil layer data forcing. What was never implemented was any time of vertical interpolation. But it would be very helpful to clarify what can be done easily now so that a path forward is chosen in the most well informed manner.
I did a rough nearest neighbor mapping of the original 6 layer vertical grid to the 10 layer vertical grid. I was able to use that new file plus my modifications to mksoiltexMod.F90 to generate a year 2000 2deg surface dataset. We can fine tune the mapping.
@olyson - that's great to hear. Did you have any problems in building and running the new code? To clarify - we are mapping a whole set of vertical layers at once in mksoiltexMod.F90 - but not doing any vertical interpolation yet.
Does it make sense to start discussing this as a smaller group some time before our Thursday CLM meeting?
If so, I may suggest Thurs at 9, in advance of the CTSM-SE meeting?
Sorry to miss the call last Friday.
@wwieder - that sounds good to me. I'm happy to join at 9.
I'm available.
Maybe the 3 of us can just join the SE meeting early? I think @dlawrenncar is in another meeting. Should others join?
@wwieder - having the 3 of us just join the SE meeting early sounds good. I'm fine with that.
I'd like to join as well. I've done a lot of work on mksurfdata in the past I think it's good for me to remain in the loop about the new version.
is 9 on thursday OK @ekluzek ?
Yes, that's fine.
It would be nice to update the soils data we're using to generate the surface dataset to something from this century. This will introduce a number of answer changes to the code, but it seems worth having a discussion about what we need here.
@dlawrenncar suggested using SoilGrids data, which just released a version 2.0 of their dataset https://doi.org/10.5194/soil-2020-65. SoilGrids2.0 contains information on soil texture, OC content, pH, bulk density, coarse fragments, CEC, and soil N at 250 m resolution for 6 soil layers (0-200 cm). This high resolution data also includes uncertainty estimates! According to the data providers, v2.0 has changed significantly from previous releases of the dataset, but is currently only available at 250m resolution.
Laura Poggio and Niels Batjes at ISRIC are interested in and willing to provide a coarser resolution data product for our purposes and wondered what we wanted. I've basically told them we'd like the whole dataset, but to prioritize texture and soil C information. Is a 5km data product adequate for NWP applications, but not too unwieldy for climate simulations? Do we need 1km resolution mapping flies?
I also wondered if we should think about how to generate soil properties for the hillslope model? Does this happen in our own tool chain, or could it be generated in the mapping files from ISRIC? This is likely of secondary concern, but may be worth discussion?