PecanProject / pecan

The Predictive Ecosystem Analyzer (PEcAn) is an integrated ecological bioinformatics toolbox.
www.pecanproject.org
Other
202 stars 235 forks source link

Soil data standard #506

Open mdietze opened 9 years ago

mdietze commented 9 years ago

In order to develop a workflow that provides soil initial conditions to the models (analogous to the met workflow in operation, and the veg workflow in development), we need to first agree on a standard set of variable names and units, as well as a standard file format. The idea is that any soil database would then be converted to this standard, PEcAn would do any processing/extraction based on that standard, and then each model would convert those standard variables into what is needed for that model.

As a first pass, I think that we'll need standard names for soil texture variables (texture class, % sand, silt, clay, organic, rock), soil moisture and temperature initial conditions, and soil biogeochemical pools.

In terms of a workflow, I think we'll want to start with texture, since it's easiest and depends least on time. That said, there's an interesting interaction here between the soil texture variables and the soil physical parameters in most models, as well as a clear need to propagate uncertainties in these.

The soil moisture and temperature variables do depend on time but can be extracted from a number of reanalysis products (e.g. NARR) and field observations (e.g. Ameriflux), and thus we should discuss whether to incorporate that into the met workflow somehow (e.g. include the processing of these variables in the standard netcdf CF files).

Finally, soil biogeochemical pools will present a particular problem because the data is crappy at a regional scale and most models do not have any direct mapping between what can be measured and their internal pools. Furthermore, a lot of data will just be %C and %N (bulk), which requires that we integrate over the (usually large) uncertainty in bulk density. Rare, but important, will be data that partitions soil biochemical pools into different physical or chemical fractions. In addition, most field measurements are over specific depth ranges rather than the full soil profile, which means that the standard processing will need to deal with this extrapolation problem as well as the wide variety of soil depths used by different models. As a starting point just having a standard for %C, %N, bulk density, and their uncertainties may be good enough to get us rolling.

Finally, soil databases will present the problem that many data layers will be in vector GIS rather than nice raster layers. For sanity sake we may want to start with the various 'harmonized' raster databases, but they are often very inaccurate when applied at the site scale. Also, many harmonized databases simply drop the presence of less frequent soil types (and only report the single most common soil type) and thus they don't represent the actual variability/uncertainty that we'd want to integrate over.

mdietze commented 9 years ago

To be a bit more concrete, what @jam2767 has done in working towards a vegetations standard it to research the variable names and units used in different data sets and metadata standards (e.g. CF), as well as checking what things are already called in BETY. This helps figure out what everyone is calling different variables and what the most common units are. A table of this information is also a really handy reference when we start to dive into different data sets and data product so that we have a simple look-up table of what things are called. A table like this for met is here https://github.com/PecanProject/pecan/wiki/Adding-an-Input-Converter#the-variable-names-should-be-standard_name

dlebauer commented 8 years ago

Don't most of these variables exist in the CF list of standard names, e.g.

screen shot 2015-10-17 at 11 31 08 pm
mdietze commented 8 years ago

Possibly, but we hadn't agreed on using netCDF CF for soils as well. That's definitely one option, especially if all variables are present. This issue is assigned to @serbinsh so I'll leave it to you and him to check whether CF has all the variables we need. File format remains an issue since we have multiple layers, vector vs raster data, and uncertainties

dlebauer commented 8 years ago

Note that while though netCDF recommends using CF, CF doesn't require using netCDF.

github-actions[bot] commented 4 years ago

This issue is stale because it has been open 365 days with no activity.