NOAA-GSL / VxIngest

Other
2 stars 0 forks source link

VxIngest

VxIngest ingests meteorological data from various sources and makes it available in a document database for verification purposes in conjunction with the "Model Application Toolsuite" (MATS) application.

Getting Started

Our ingest process has two components:

The ingest program works to consume raw model output and observation data in GRIB & NetCDF format and turn that data into VxIngest's common data format. The ingest program writes the data out to disk as Couchbase JSON documents, along with some log output and Prometheus metrics. The ingest program also works to calculate aggregate statistics on the contents of the Couchbase database, like CTCs & Partial Sums

The import script wraps the Couchbase cbimport CLI tool and imports the Couchbase JSON documents created by the "ingest" process into the database.

If you're interested in some diagrams showing the data flow, see the Diagrams section

Usage

VxIngest is containerized for deployment. If you are developing the application, see the Development Guide for information on setting up a development environment; as well as for information on linting, formatting, and testing.

Using the container

You will first need to build the docker container with the following:

docker build \
    --build-arg BUILDVER=dev \
    --build-arg COMMITBRANCH=$(git branch --show-current) \
    --build-arg COMMITSHA=$(git rev-parse HEAD) \
    -f ./docker/ingest/Dockerfile \
    -t vxingest:dev \
    .

To run the ingest, you will first need to create a file like the below with the database credentials in ${HOME}/credentials:

${HOME}/credentials:

cb_host: "url.for.couchbase"
cb_user: "user"
cb_password: "password"
cb_bucket: "vxdata"
cb_scope: "_default"
cb_collection: "METAR"

The cb_host file requires a protocol. For example ... "couchbase://adb-cb1.gsd.esrl.noaa.gov" - because adb-cb1... is a single node cluster. For adb-cb2 (which is one node of a multinode cluster) it would be "couchbases://adb-cb2.gsd.esrl.noaa.gov". Any of the nodes would suffice.

Once that's in place, you can run the ingest with Docker Compose like the example below. Note the public and data env variables respectively point to where the input data resides and where you'd like the container to write out to. They are the only part of the command you would need to modify.

data=/data-ingest/data \
    public=/public \
    docker compose run ingest

You can run the "import" via Docker Compose like this example. You will need to use the same value for data as you used for the "ingest".

data=/data-ingest/data \
    docker compose run import

Diagrams

Data flow for Model & Observation ingest (GRIB & NetCDF)

---
title: Model & Obs Ingest
---
flowchart LR
    data --> |1. Reads new data| ingest
    ingest --> |2. Writes data out <br>as JSON files| disk
    disk --> |3. Imports JSON files| import
    import --> |4. Inserts files| cb

    subgraph Application Layer
        ingest(Ingest)
        import(Import)
    end
    subgraph Data Layer
        data[[Model & Obs Data]]
        disk[[Files on Disk]]
        cb[(Couchbase)]
    end

Data flow for Aggregate Statistics ingest. (CTC & Partial Sum)

---
title: CTC & Partial Sums Ingest
---
flowchart LR
    ingest --> |1. Gets data from Couchbase| cb
    ingest --> |2. Writes data out as JSON files| disk
    disk --> |3. Imports JSON files| import
    import --> |4. Inserts files| cb

    subgraph Application Layer
        ingest(Ingest)
        import(Import)
    end
    subgraph Data Layer
        disk[[Files on Disk]]
        cb[(Couchbase)]
    end

Disclaimer

This repository is a scientific product and is not official communication of the National Oceanic and Atmospheric Administration, or the United States Department of Commerce. All NOAA GitHub project code is provided on an “as is” basis and the user assumes responsibility for its use. Any claims against the Department of Commerce or Department of Commerce bureaus stemming from the use of this GitHub project will be governed by all applicable Federal law. Any reference to specific commercial products, processes, or services by service mark, trademark, manufacturer, or otherwise, does not constitute or imply their endorsement, recommendation or favoring by the Department of Commerce. The Department of Commerce seal and logo, or the seal and logo of a DOC bureau, shall not be used in any manner to imply endorsement of any commercial product or activity by DOC or the United States Government.