Non-steady state (static) chambers are widely used for measuring soil
greenhouse gas (GHG) fluxes, such as CO2, CH4,
N2O, NH3, CO, and water vapor (H2O).
While linear regression (LM) is commonly used to estimate GHG fluxes,
this method tends to underestimate the pre-deployment flux
(f0). Indeed, non-linearity is expected when the gas
concentration increases or decreases inside a closed chamber, due to
changes in gas gradients between the soil and the air inside the
chamber. In addition, lateral gas flow and leakage contribute to
non-linearity. Many alternative to LM have been developed to try to
provide a more accurate estimation of f0, for instance the
method of Hutchinson and Mosier (HM)
(1981), which
was implemented in the HMR
package by Pedersen et al.,
2010. However,
non-linear models have a tendency to largely overestimate some flux
measurements, due to an exaggerated curvature. Therefore, the user is
expected to decide which method is more appropriate for each flux
estimate. With the advent of portable greenhouse gas analyzers
(e.g. Los Gatos Research (ABB) laser gas
analyzers),
soil GHG fluxes have become much easier to measure, and more efficient
ways to calculate these flux estimates are needed in order to process
large amounts of data. A recent approach was developed by Hüppi et al.,
2018 to restrict the
curvature in the HM model for a more reliable flux estimate. In the HM
model, the curvature is controlled by the kappa parameter, $\kappa$.
Hüppi et al. suggest the use of the KAPPA.MAX
to limit the maximal
curvature allowed in the model (see their R package
gasfluxes
, available
on the CRAN). This procedure introduces less arbitrary decisions in the
flux estimation process.
Previous software developed to calculate GHG fluxes are limited in many
aspects that the goFlux package is meant to overcome. Most are limited
to the linear regression approach (e.g. Flux
Puppy,
and the R packages flux
and
FluxCalR
), others do not
include data pre-processing (e.g. the R packages
HMR
,
flux
and
gasfluxes
), or if they
do, they are compatible with only a few designated systems (e.g. Flux
Puppy
and FluxCalR
), and none
include an automatic selection of the best flux estimate (LM or HM)
based on objective criteria, except the R packages
gasfluxes
and
HMR
.
This new R package goFlux
is meant to be “student proof”, meaning that
no extensive knowledge or experience is needed to import raw data into
R, choose the best model to calculate fluxes (LM or HM), quality check
the results objectively and obtain high quality flux estimates. The
package contains a wide range of functions that allows the user to
import raw data from a variety of instruments (LI-COR, LGR, GAIA2TECH,
Gasmet, Picarro, Aeris and PP-Systems); calculate fluxes from a variety
of GHG (CO2, CH4, N2O, NH3,
CO and H2O) with both linear (LM) and non-linear (HM) flux
calculation methods; align instruments clocks after data import;
interactively identify measurements (start and end time) if there are no
automatic chamber recordings (e.g. LI-COR smart chamber); plot
measurements for easy visual inspection; and quality check the
measurements based on objective criteria.
Three R packages for the Elven-kings under the CRAN,
Seven for the Dwarf-lords in their halls of open software,
Nine for Mortal Men doomed to write scripts themselves,
One for the Dark Lady on her dark throne
In the Land of GitHub where the Shadows lie.
One Package to rule them all, One Package to find them,
One Package to bring them all and in the darkness bind them
In the Land of GitHub where the Shadows lie.
The R package goFlux
is meant to be “student proof”, meaning that no
extensive knowledge or experience is needed to import raw data into R
(except for knowing how to use R, of course), choose the best model to
calculate fluxes (LM or HM, that is the question. -Shakespeare, 1603),
quality check the results objectively (hence the no experience needed)
and obtain high quality flux estimates from static chamber measurements
(wonderful!).
Look up this webpage for a demonstration of the package usage.
The package contains a wide range of functions that let the user import raw data from a variety of instruments:
After import, the user can choose from two methods to define the start and end points of each measurement and assign a UniqueID:
start.time
,
provided separately in an auxiliary file. The function obs.win
splits the imported data into a list of data frame (divided by
UniqueID
) and creates an observation window around the start.time
to allow for a manual selection of the start and end points of each
measurements, using the function click.peak2
.Follow these links for more details and demonstrations about importing raw data from the listed instruments, or the identification of measurements.
The function goFlux
calculates fluxes from a variety of greenhouse
gases (CO2, CH4, N2O, NH3,
CO, and H2O) using both linear (LM) and non-linear (HM;
Hutchinson and Mosier,
1981) flux
calculation methods. The HM model for the chamber concentration $C_t$ at
time $t > 0$ after deployment is given by:
$$\mathbf{Eqn~1}~~C_t = \varphi~+~(C_0 - \varphi)e^{-~\kappa~t}$$
Where $\varphi$ is the assumed concentration of constant gas source below the surface (also known as $C_i$), $C_0$ is the concentration in the chamber at the moment of chamber closure and $\kappa$ (kappa) determines the curvature of the model. A large kappa returns a strong curvature.
A maximum threshold for this parameter, kappa-max ($k.max$), can be calculated from the linear flux estimate ($LM.flux$), the minimal detectable flux ($MDF$) and the time of chamber closure ($t$) (Hüppi et al., 2018).
$$\mathbf{Eqn~2}~~k.max = \frac{LM.flux}{MDF~\times~t}$$
Where $LM.flux$ and $MDF$ have the same units (nmol or
µmol·m-2·s-1) and $t$ is in seconds. Therefore,
the units of kappa-max is s-1. This limit of kappa-max is
included in the goFlux
function, so that the non-linear flux estimate
cannot exceed this maximum curvature. See below for more details about
the minimal detectable flux (MDF).
All flux estimates, including the MDF, are multiplied by a $flux.term$ which is used to correct for water vapor inside the chamber, as well as convert the units to obtain a term in nmol or µmol·m-2·s-1:
$$\mathbf{Eqn~3}~~flux.term = \frac{(1 - H_2O)~V~P}{A~R~T}$$
Where $H_2O$ is the water vapor in mol·mol-1, $V$ is the volume inside the chamber in liters, $P$ is the pressure in kPa, $A$ is the surface area inside the chamber in m2, $R$ is the universal gas constant in L·kPa·K-1·mol-1. Each parameters are measured inside the chamber at $t = 0$.
More details and demonstrations about the function goFlux
can be found
here.
Following flux calculation, the user can select the best flux estimate
(LM or HM) based on objective criteria, using the best.flux
function:
In addition, the best.flux
function can flag measurements that are
below detection limit (MDF and p-value), out of bounds (intercept), or
too short (number of observations).
By default, all criteria are included:
criteria = c("MAE", "RMSE", "AICc", "SE", "g-factor", "kappa", "MDF", "nb.obs", "p-value", "intercept")
A demonstration of the usage of the function best.flux
can be found
here.
The g-factor is the ratio between a non-linear flux estimate (e.g. Hutchinson and Mosier; HM) and a linear flux estimate (Hüppi et al., 2018).
$$\mathbf{Eqn~4}~~g-factor = \frac{HM.flux}{LM.flux}$$
With the best.flux
function, one can choose a limit at which the HM
model is deemed to overestimate (f0). Recommended
thresholds for the g-factor are \<4 for a flexible threshold, \<2 for a
medium threshold, or \<1.25 for a more conservative threshold. The
default threshold is g.limit = 2
. If the g-factor is above the
specified threshold, the best flux estimate will switch to LM instead of
HM. If HM.flux/LM.flux
is larger than g.limit
, a warning is given in
the columns HM.diagnose
and quality.check
.
The minimal detectable flux ($MDF$) is based on instrument precision ($prec$) and measurement time ($t$) (Christiansen et al., 2015).
$$\mathbf{Eqn~5}~~MDF = \frac{prec}{t}~\times~flux.term$$
Where the instrument precision is in the same units as the measured gas (ppm or ppb) and the measurement time is in seconds.
Below the MDF, the flux estimate is considered under the detection
limit, but not null. Therefore, the function will not return a 0. There
will simply be a warning in the columns quality.check
, LM.diagnose
or HM.diagnose
to warn of a flux estimate under the detectable limit.
No best flux estimate is chosen based on MDF.
The parameter kappa determines the curvature of the non-linear
regression in the Hutchinson and Mosier model, as shown in equation 1.
The limit of kappa-max, as calculated in equation 2, is included in the
goFlux
function, so that the non-linear flux estimate cannot exceed
this maximum curvature.
In the function best.flux
, one can choose the linear flux estimate
over the non-linear flux estimate based on the ratio between kappa and
kappa-max (k.ratio
). A value of 1 indicates HM.k = k.max
, and a
value of e.g. 0.5 indicates HM.k = 0.5*k.max
. The default setting is
k.ratio = 1
. If HM.k/k.max
is larger than k.ratio
, a warning is
issued in the columns HM.diagnose
and quality.check
. The ratio is
expressed as a percentage of kappa-max in the plots.
In the best.flux
function, we included multiple choices of indices of
model fit, described below. One can choose to include however many of
them in the function. If multiple of them are included, the selection of
the best model will be made based on a scoring system. Both models start
with a score of 0. For each criteria, whichever model performs worst is
given +1. After all selected criteria have been evaluated, the model
with the lowest score wins. In case of a tie, the non-linear flux
estimate is always chosen by default, as non-linearity is assumed. The
score is printed in the output data frame in the columns HM.score
and
LM.score
.
The mean absolute error (MAE) is the arithmetic mean of the absolute residuals of a model, calculated as follows:
$$
\mathbf{Eqn~6}~~MAE = \frac{1}{n} \sum_{i = 1}^{n}{\lvert{y_i-\hat{y_i}}\rvert}
$$
Where $y_i$ is the measured value, $\hat{y_i}$ is the predicted value and $n$ is the number of observations.
The root mean square error (RMSE) is very similar to the MAE. Instead of using absolute errors, it uses squared errors, and the mean of the squared errors is then rooted as follows:
$$
\mathbf{Eqn~7}~~RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n}{({y_i-\hat{y_i}})^2}}
$$
Because of the squared errors, RMSE is sensitive to outliers. Indeed, a few large errors will have a significant impact on the RMSE. Therefore, RMSE will always be larger than or equal to MAE (Pontius et al., 2008).
Mathematically, RMSE is equivalent to the standard deviation of the residuals. Indeed, for a constant gas concentration, the standard deviation is expressed as follows:
$$
\mathbf{Eqn~8}~~\sigma = \sqrt{\frac{1}{N} \sum_{i = 1}^{N}{({x_i-\mu})^2}}
$$
Where $x_i$ is the measured value, $N$ is the size of the population and $\mu$ is the population mean. The standard deviation is used to calculate the precision of an instrument. In that case, $\mu$ is a known constant gas concentration and $N$ is the number of observations.
Considering all of the above, MAE, RMSE and the standard deviation are all measures of how much the data points are scattered around the true mean or the regression. Therefore, MAE and RMSE can be compared to the instrument precision to determine if a measurement is noisy. For a MAE or RMSE larger than the instrument precision, the measurement is considered to have more noise than normally expected.
If MAE is chosen as a criteria in best.flux
, the model with the
smallest MAE is chosen. If both models have a MAE smaller than the
instrument precision, the non-linear flux estimate is always chosen by
default, as non-linearity is assumed. When MAE is larger than the
instrument precision, a warning is given in the columns quality.check
,
LM.diagnose
or HM.diagnose
to warn of a noisy measurement. RMSE
functions exactly the same was as MAE in the best.flux
function.
While the standard deviation describes how the data points are scattered around the true mean, the standard error of a measurement tells how accurate that measurement is compared to the true population mean (Altman and Bland, 2005). If considering the standard deviation as used to calculate instrument precision (equation 8), the instrument standard error (instrument accuracy) can be defined as the standard deviation divided by the square root of the number of observations:
$$\mathbf{Eqn~9}~~\sigma_\bar{x} = \frac{\sigma}{\sqrt{n}}$$
Practically, this means that the accuracy increases with the sample size of a measurement. In other words, if an instrument is imprecise and the measurement has a lot of noise, it is still possible to get a more accurate estimate of the true mean by increasing the number of observations. With high-frequency GHG analyzers, that means increasing the chamber closure time.
To calculate the standard error of a regression, one can use the delta
method (deltamethod
from the msm
package), which propagates the
total error of the flux calculation for each parameter included in the
formula. The delta method approximates the standard error of a
regression $g(X)$ of a random variable $X = (x_1, x_2, ...)$, given
estimates of the mean and covariance matrix of $X$ (Oehlert,
1992; Mandel,
2013).
In the function best.flux
, the standard error (SE) of the measurements
can be compared to the standard error of the instrument
($\sigma\bar{x}$). If SE is chosen as a criteria in best.flux
, the
model with the smallest SE is chosen. If both models have a SE smaller
than the instrument precision, the non-linear flux estimate is always
chosen by default, as non-linearity is assumed. When SE is larger than
the instrument accuracy ($\sigma\bar{x}$), a warning is given in the
columns quality.check
, LM.diagnose
or HM.diagnose
to warn of a
noisy measurement.
The AIC estimates the relative quality of a statistical model and is used to compare the fitting of different models to a set of data (Akaike, 1974). Consider the formula for AIC:
$$\mathbf{Eqn~10}~~AIC = 2k - 2ln(\hat{L})$$
Where $k$ is the number of parameters in the model and $\hat{L}$ is the maximized value of the likelihood function for the model. AIC deals with the trade-off between the goodness of fit of a model and the simplicity of the model. In other words, the AIC is a score that deals with both the risk of underfitting and the risk of overfitting, and the model with the lowest score has the best model fit.
In flux calculation, the linear model contains two parameters: the slope and the intercept ($C_0$), whereas the Hutchinson and Mosier model (equation 1) contains three parameters: the assumed concentration of constant gas source below the surface ($\varphi$), the concentration in the chamber at the moment of chamber closure ($C_0$) and the curvature, kappa ($\kappa$). If both models have a very similar fit (maximum likelihood), then the linear model would win because it has less parameters. However, when the sample size is small (\<40 observations per parameter; i.e. \<120 observations when calculating HM), there is an increased risk that AIC selects a model with too many parameters. To address this risk of overfitting, AICc was developed (Sugiura, 1978):
$$\mathbf{Eqn~11}~~AICc = AIC + \frac{2k^2 + 2k}{n - k - 1}$$
Where $n$ denotes the number of observations and $k$ the number of parameters in the model.
If AICc is selected as a criteria in the best.flux
function, the model
with the lowest AICc wins.
If the initial gas concentration (C0) calculated for the
flux estimates (HM.C0
and LM.C0
) deviates from the true
C0 (measured concentration at $t = 0$) by more or less than
10% of the difference between C0 and the final gas
concentration at the end of the measurement (Ct), a warning
is issued in the columns quality.check
, LM.diagnose
or HM.diagnose
to warn of an intercept out of bounds. Alternatively, one can provide
boundaries for the intercept, for example: intercept.lim = c(380, 420)
for a true C0 of 400 ppm.
This criteria is only applicable to the linear flux. Under a defined
p-value, the linear flux estimate is deemed non-significant, i. e.,
flux under the detectable limit. The default threshold is
p.val = 0.05
. No best flux estimate is chosen based on p-value. If
LM.p.val < p.val
, a warning is given in the columns quality.check
and LM.diagnose
to warn of an estimated flux under the detection
limit.
Limit under which a measurement is flagged for being too short
(nb.obs < warn.length
). With nowadays’ portable greenhouse gas
analyzers, the frequency of measurement is usually one observation per
second. Therefore, for the default setting of warn.length = 60
, the
chamber closure time should be approximately one minute (60 seconds). If
the number of observations is smaller than the threshold, a warning is
issued in the column quality.check
.
Finally, after finding the best flux estimates, one can plot the results
and visually inspect the measurements using the function flux.plot
and
save the plots as pdf using flux2pdf
.
Find out more about these functions here.
To install a package from GitHub, one can use the package devtools
(or
remotes
) from the CRAN:
if (!require("devtools", quietly = TRUE)) install.packages("devtools")
Then, install the goFlux
package from GitHub. If it is not the first
time you install the package, it must first be detached before being
updated.
The package is constantly being updated with new functions or de-bugging. To make sure that you are using the latest version, re-install the package every time you use it.
try(detach("package:goFlux", unload = TRUE), silent = TRUE)
devtools::install_github("Qepanna/goFlux")
If prompted, it is recommended to update any pre-installed packages.
The functioning of the package depends on many other packages
(AICcmodavg
, data.table
, dplyr
, ggnewscale
, ggplot2
, ggstar
,
graphics
, grDevices
, grid
, gridExtra
, lubridate
, minpack.lm
,
msm
, pbapply
, plyr
, purrr
, rjson
, rlist
, SimDesign
,
stats
, stringr
, tibble
, tidyr
, utils
), which will be installed
when installing goFlux
.
install_github
If you get this warning while trying to install an R package from GitHub:
## Warning: package ‘goFlux’ is in use and will not be installed
This error means that the package is loaded. Before re-installing the package, you must first detach it:
detach("package:goFlux", unload = TRUE)
If this does not solve the problem, restart your session and try again.
If you get this error while trying to install an R package from GitHub:
## Error: Failed to install 'goFlux' from GitHub:
## HTTP error 403.
## API rate limit exceeded for xxx.xxx.xxx.x (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)
##
## Rate limit remaining: 0/60
## Rate limit reset at: 2023-10-16 09:08:07 UTC
##
## To increase your GitHub API rate limit
## - Use `usethis::create_github_token()` to create a Personal Access Token.
## - Use `usethis::edit_r_environ()` and add the token as `GITHUB_PAT`.
This error can occur while using the package remotes
to install an R
package from GitHub. Try using the package devtools
instead. If the
error still occurs, follow the instructions below.
Go to https://github.com/join .
Type a user name, your email address, and a password.
Choose Sign up for GitHub, and then follow the instructions.
Run in R:
usethis::create_github_token()
On pop-up website, generate and copy your token.
Run in R with your own token generated in step 2:
credentials::set_github_pat("YourTokeninStep2")
Authors: Karelle Rheault and Klaus Steenberg Larsen
The package is ready to use and fully functional, but errors may still occur. To report any issues, suggest improvements or ask for new features, open an issue on GitHub. Alternatively, contact directly the maintainer, Karelle Rheault (karh@ign.ku.dk), including script and raw data if necessary. Thank you for helping me in the development of this tool! 🙏
Please cite this R package using this publication in JOSS:
Rheault et al., (2024). goFlux: A user-friendly way to calculate GHG fluxes yourself, regardless of user experience. Journal of Open Source Software, 9(96), 6393, https://doi.org/10.21105/joss.06393
This software development was supported by the SilvaNova project funded by the NOVO Nordisk Foundation (grant no. NNF20OC0059948).