OceanGNS / PGPT

A python-based processing tool to turn slocum glider binary files to self-describing US IOOS GDAC v3.0 netcdf files
3 stars 1 forks source link

Slocum Python Glider Processing Toolbox (PGPT)

This is a minimal glider processing toolbox using Python to go from Slocum glider raw data files to self-describing *.nc files, that pass compliance checking with the US Integrated Ocean Observing System (US-IOOS) Glider Data Acquisition Centre (GDAC). The *.nc files from this toolbox should pass requirements to be ingested into the Global Telecomunications System (GTS) for further use in models. We follow the US IOOS guidelines for file format and structure of glider data.

1. Requirements

  1. bash, GNU parallel
  2. Python3
  3. pip packages:
    • PyYAML
    • Cerberus
    • pandas
    • numpy
    • xarray
    • gsw
    • dbdreader

For Mac users: This code uses Linux "date" function. Download the coreutils library brew install coreutils ; echo "alias date=gdate" >> ~/.bash_profile

2. Description

The intent of this toolbox is to produce a clean data set from raw glider data for sharing with data centres and for further careful scientific post-processing (expert processing), by preserving the original data resolution and associated metadata. This toolbox does not do enhanced checks for data Quality Control (QC) but performs some data flagging following guidelines for the Quality Assurance / Quality Control of Real Time Oceanographic Data (Quartod).

This toolbox seperately supports both realtime (while glider is deployed) and delayed data mode (after glider is recovered). The user can tell the toolbox which mode to use. The processing levels in both modes are the same, but delayed mode will contain the complete dataset while realtime may not.

Features

3. How to Run

Clone this repository to your desired location on your machine. Follow the steps below and run the provided shell script to process your glider data. Modify the toolbox as needed. A working example is provided.

Directory Structure

Create a new directory called realtime or delayed, and another directory inside called raw and put all your binary .TBD|.SBD (if realtime data) or .DBD|.EBD (if delayed data) files in there. You can also put compressed Slocum binary files .?CD files in the raw directory; the script will automatically uncompress them.

Create a new directory called cache in the same path as the realtime or delayed directory and put the necessary cache files in there.

Create a YAML file including all the metadata needed for the netCDF files in the same path as the realtime or delayed and cache directories. You can use the metadata.yml file included in the exmple directory as a template and modify as needed.

At the end, your directory structure should look like this:

.
├── cache
│   ├── 00CDA96E.CAC
│   ├── 02A6E8E6.CAC
│   ├── 1A2BF75A.CAC
│   ├── 1BD4CF69.CAC
│   └── ...
├── delayed
│   └── raw
│      ├── 02150054.DBD
│      ├── 02150054.EBD
│      ├── 02150055.DBD
│      ├── 02150055.EBD
│      ├── 02150056.DBD
│      ├── 02150056.EBD
│      └── ...
└── metadata.yml

Processing Files

Once all files and directories are in place, execute the following command (specify absolute path to the run directory):

run.sh -g glider_name -d absolute_path_to_mission_directory -m metadata_yaml_filename -p realtime_or_delayed

For the included example:

run.sh -g unit_334 -d /home/User/Github/PGPT/example -m metadata.yml -p delayed

Once the run.sh script is done, there will be 1 new directory in the mission directory, nc, which includes the netCDF files for each profile in the format gliderName-fileNameXXXX_processingMode.nc and a trajectory file that combines all the profiles in the format gliderName_processingMode_trajectory.nc.

Once the toolbox has run, the user can set a shell script to upload the data to an FTP server or a GDAC. Similarly the user can setup a script to sync the directory with the glider remote server (e.g. SFMC) and re-run the toolbox whenever there is new data to be processed.

4. Docker Image

A docker image of the Glider Processing Toolbox is available in Docker Hub for convenience:

docker pull taimaz/pgpt:1.0.2

and then run:

docker run -it pgpt