AU-BCE-EE / ALFAM2-data

The ALFAM2 dataset on ammonia loss from field-applied manure
GNU General Public License v3.0
5 stars 2 forks source link

ALFAM2-data

The ALFAM2 dataset (database) on ammonia loss through volatilization from field-applied manure. This repository contains the "ALFAM2 dataset", the code used to generate it from data files submitted by researchers who made the emission measurements, and those original data files.

Quick tips

More details

The dataset consists of two files: one with plot- and one with measurement-level observations. See the headers directory here for variable descriptions. Files can be merged on the two plot keys or identification codes: pid and pmid.

Files are saved in a compressed format with the extension .gz (gzip) in order to reduce file size. The easiest way to get the data into R is to read the files directly with data.table::fread(). Alternatively, various utilies can be used to extract (unzip) the files, which are regular comma-separated ASCII text files.

The ALFAM2 data are organized into uptake periods: 1 is for the original ALFAM work, 2 for the work described in this paper, and 3 for the current effort to expand the database. The latest version will always be in the highest number (3 currently). Earlier versions are saved to facilitate addition of data without rebuilding older dataset versions while maintaining the option for revising data submitted in an earlier uptake period.

See the Data handling tips below for more information on working with the data.

Citations

The best way to refer to this work if you use the data in a publication is to cite the version available through Zenodo. In addition to a digital object identifier (doi), this Zenodo version includes an author list. Alternatively, individual releases can be cited. In either case, be sure to note the version number used in any citation. If you have used a version not included in a release or a Zenodo dataset, please contact me (here)and I will create the necessary release or dataset. Authors may also want to cite (and read!) this paper. But again, include the dataset version number in any citation. You can find the dataset version number in the release tags or in a text file here: data-output/03/data_version.txt.

References

For a description of the dataset, see this paper: https://doi.org/10.1016/j.agrformet.2017.11.027. For the ALFAM2 model for ammonia emission, with parameter estimation based on the ALFAM2 dataset, see this paper: https://doi.org/10.1016/j.agrformet.2017.11.027.

Data handling tips

The simplest way to load the data in R is with the fread() function from the data.table package.

idat <- data.table::fread('ALFAM2_interval.csv.gz')
pdat <- data.table::fread('ALFAM2_plot.csv.gz')

Once these two data frames (tables) are created, they can be combined (if needed) with base::merge() or data.table::merge() function.

cdat <- merge(idat, pdat, by = c('pid', 'pmid'))

(The by argument is optional here, but it is good practice to be aware of the columns used for merging.)

In Python, the read_csv() function from the pandas package can be used to read the compressed files directly.

import pandas as pd

idat = pd.read_csv('ALFAM2_interval.csv.gz')

For some more information, see the analysis directory.

Submitting data

To inquire about submitting data to the ALFAM2 dataset, send a message to sasha.hafner AT bce.au.dk through email or else used this form.

Checking data

To check submitted data, find your completed template file in this list and follow the instructions you will see at the top of the linked file.

Building the database

After cloning this repo, the process of creating the ALFAM2 database from submitted data can be completely repeated by running two scripts in R (https://www.r-project.org/) sequentially:

  1. scripts/02/main.R
  2. scripts/03/main.R

The only obvious reason for doing so is to update the database after making a correction to submitted data or data processing code. (Submitted data, as spreadsheet templates (usually) completed by data submitters, can be found in the data-submitted directory.) For changes to only the most recent submitted file, only the highest numbered file, for the latest uptake period, needs to be run. Because of changes in base R and add-on packages over time, seeing R return an error when updating earlier uptake periods, or even the latest period some time after the latest update by database developers, is possible (e.g., this problem was caused by a change in base R). The record of R and add-on package versions in logs/*/R_package_version.txt should help sort out any of these problems.

Directory structure