fluxnet / ONEFlux

Open Network-Enabled Flux processing pipeline
Other
78 stars 41 forks source link

ONEFlux Processing Pipeline

ONEFlux (Open Network-Enabled Flux processing pipeline) is an eddy covariance data processing codes package jointly developed by the AmeriFlux Management Project, the European Fluxes Database, and the ICOS Ecosystem Thematic Centre. ONEFlux is used for the standard processing and data product creation for these networks.

ONEFlux consolidates multiple computations to process half-hourly (or hourly) flux inputs in an automatic fashion, including friction velocity threshold estimation methods and filtering, gap-filling of micrometeorological and flux variables, partitioning of CO2 fluxes into ecosystem respiration and gross primary production, uncertainty estimates, and more.

The current version of the code is compatible with the code base used to create the FLUXNET2015 dataset, and data processed with ONEFlux can be used in conjunction with data from FLUXNET2015.

The pipeline controlling code uses Python version 2.7 (it should work with Python version 3.5 or later, but was not fully tested with these versions; update of the code to Python 3 is ongoing).

(THERE ARE CAVEATS AND KNOWN LIMITATIONS TO THIS CODE, PLEASE SEE CAVEATS LIST BELOW.) This iteration of the code is not fully in line with open source/free software development practices, but we intend to steadily move in that direction.

Required data and metadata variables

To run ONEFlux, certain data variables and addtional information about the site and instrument configuration are needed. Required data variables must be available in the input data, otherwise the ONEFlux will not run. Encouraged data variables must be present for realated derived data products to be generated, and although ONEFlux will run if these are missing, not all products will be generated. Suggested data variables are supported by ONEFlux, but are not directly used for the generation of any derived data products.

Additional information information about data variables is available. Also note that multiple depths of soil temperature (TS) and soil water content (SWC) are supported by ONEFlux, using the numeric _# suffix notation (e.g., TS_1).

Information including site FLUXNET ID, latitude, longitude, timezone (adopted for timestamps in data file), complete history of the height for eddy covariance system (gas analyser and sonic anemometer), the temporal resoltion for the data files (usually 30 or 60 minuted), and how CO2 flux storage is handled at the site, are also all required information for ONEFlux runs.

Implemented steps

The steps implemented in the ONEFlux processing pipeline are detailed in the data processing description page of the FLUXNET2015 dataset.

The outputs of each of these steps is saved to a sub-directories of a directory containing the data for a site. The structure of these output folders includes:

Building and installing

A installation script is available in the form of a Makefile, which can be used in Linux (x86-64) systems; versions for Windows and Mac are planned but not available at this time.

Running the command $ make in the source code folder will install all required Python dependencies, compile all C modules and install them in the user home directory under ~/bin/oneflux/ (gcc version 4.8 or later is required to compile the C modules), and will also copy to the same destination an executable compiled version of a Matlab code (see below how to install MCR and run this code). Also note that the Python modules for ONEFlux will not be installed, so the source code will need to be used directly to configure paths and call the main pipeline execution.

Installing MCR to run compiled MATLAB code

A compiled version of the MATLAB code for the Change Point detection method for USTAR threshold estimation is available (under ../ONEFlux/oneflux_steps/ustar_cp/bin/) and is copied into the executables directory along with the compiled version of the steps implemented in C. (currently only a version for Linux x86-64 environment is available)

To run this MATLAB compiled code, it is necessary to install the MATLAB Compiler Runtime toolset. It can be downloaded in the MCR page. Version 2018a is required (this version was used to compile the code). Follow the instructions in the download page to install MCR.

The path to the newly installed MCR environment (e.g., ~/bin/matlab/v94/) is a necessary input to the pipeline execution if this step is to be executed.

Running

Run Python using the file runoneflux.py with the following parameters:

usage: runoneflux.py [-h] [--perc [PERC [PERC ...]]]
                     [--prod [PROD [PROD ...]]] [-l LOGFILE] [--force-py]
                     [--mcr MCR_DIRECTORY] [--ts TIMESTAMP] [--recint {hh,hr}]
                     COMMAND DATA-DIR SITE-ID SITE-DIR FIRST-YEAR LAST-YEAR

positional arguments:
  COMMAND               ONEFlux command to be run [all, partition_nt, partition_dt]
  DATA-DIR              Absolute path to general data directory
  SITE-ID               Site Flux ID in the form CC-XXX
  SITE-DIR              Relative path to site data directory (within data-dir)
  FIRST-YEAR            First year of data to be processed
  LAST-YEAR             Last year of data to be processed

optional arguments:
  -h, --help            show this help message and exit
  --perc [PERC [PERC ...]]
                        List of percentiles to be processed
  --prod [PROD [PROD ...]]
                        List of products to be processed
  -l LOGFILE, --logfile LOGFILE
                        Logging file path
  --force-py            Force execution of PY partitioning (saves original
                        output, generates new)
  --mcr MCR_DIRECTORY   Path to MCR directory
  --recint {hh,hr}      Record interval for site
  --versionp VERSIONP   Version of processing
  --versiond VERSIOND   Version of data

Running examples

Sample data

Data formatted to be used in the examples below are available. The sample input data (around 80MB) can be used to run the full pipeline. To check the processing worked as expected, the sample output data (around 400MB) cab be used.

Execution commands

Run all steps in the pipeline:

Run nighttime partitioning method:

python runoneflux.py partition_nt "../datadir/" US-ARc "US-ARc_sample_input" 2005 2006 -l fluxnet_pipeline_US-ARc.log

Run daytime partitioning with only single percentile and/or a single USTAR threshold type data product (recommended for first executions), use:

Note that for the execution of the partitioning steps, only if the output *.csv file doesn’t exist (e.g., nee_y_50_US-ARc_2005.csv), the code will run and generate the file. If it exists, nothing will be done (unless the flag --force-py is used).

Required input data

All steps

In the data directory for the site, the input data must be in the expected formats, especially for individual steps within the pipeline. If the full pipeline is being executed, the inputs that must be present should be in the following directories:

and the outputs will be generated in the directories:

Flux partitioning steps only

For both the nighttime and daytime partitioning methods, the inputs that must be present should be in the following directories:

and the outputs will be generated, respectively, in the directories:

Caveats and known limitations

Support and Funding

This material is based upon work supported by:

Contributors

Development and Code

Evaluation

Citation

When using ONEFlux or referring to data products generated with ONEFlux, please consider citing our reference paper:

Pastorello et al. The FLUXNET2015 dataset and the ONEFlux processing pipeline for eddy covariance data. Scientific Data 7:225 (2020). 10.1038/s41597-020-0534-3



(THERE ARE CAVEATS AND KNOWN LIMITATIONS TO THIS CODE, PLEASE SEE CAVEATS LIST (ABOVE)