datopian / ckanext-aircan

The custom extension for notifying(triggering) the Airflow DAG about the data to be uploaded to DataStore
GNU Affero General Public License v3.0
7 stars 5 forks source link

ckanext-aircan

A CKAN extension for integrating the AirFlow-based AirCan Data Factory into CKAN. Specifically, this extension provides:

Installation

Basic Setup

There are two potential cases:

Configuration

Local Airflow instance

Airflow instance on Google Composer

Assuming you already have a Google Cloud Composer properly set up, it is possible to trigger a DAG on GoogleCloud Platform following these steps:

Getting Started

Triggering a Workflow (DAG)

Make a request to http://YOUR-CKAN:5000/api/3/action/aircan_submit?dag_name=DAG_NAME, specifying your CKAN_API_KEY on the header and send the following information on the body of the request, replacing the values accordingly:

{
  "package_id": "YOUR_PACKAGE_ID",
  "url":  "http://url.for.your.resource.com",
  "description": "This is the best resource ever!" ,
  "schema": {
    "fields":  [
          {
            "name": "FID",
            "type": "int",
            "format": "default"
          },
          {
            "name": "another-field",
            "type": "float",
            "format": "default"
          }
        ]
  }
}

Replace dag_name with the DAG you want to invoke, for example, http://YOUR-CKAN:5000/api/3/action/aircan_submit?dag_name=ckan_api_load_gcp. This will trigger the DAG ckan_api_load_gcp.

NB: the DAG ckan_api_load_gcp is designed for Google Cloud Composer AirFlow instance and will load a resource into the DataStore.

The endpoint http://YOUR-CKAN:5000/api/3/action/resource_create produces the same effect of http://YOUR-CKAN:5000/api/3/action/aircan_submit?dag_name=DAG_NAME. Make sure you set up an extra variable on your .env file specifying the DAG you want to trigger:

# .env
# all other variables
CKAN__AIRFLOW__CLOUD__DAG_NAME=DAG_YOU_WANT_TO_TRIGGER

For CKAN Datastore data loader dag

Update aircan run status

The aircan_status_update API can be use to store or update the run status for given resource. It accepts the POST request with authorized user.

{ 
    "resource_id": "a4a520aa-c790-4b53-93aa-de61e1a2813c",
    "state": "progress",
    "message":"Pusing dataset records.",
    "dag_run_id":"394a1f0f-d8b3-47f2-9a51-08732349b785",
    "error": {
        "message" : "Failed to push data records."
    }
}

Retrieving aircan run status

Use aircan_status API to get aircan run status for given resource providing resource id. eg. http://YOUR-CKAN:5000/api/3/action/aircan_status

{
  "resource_id": "a4a520aa-c790-4b53-93aa-de61e1a2813c"
}

Tests with Cypress

Test the aircan-connector with cypress.

Installation

npm install

Running

Opens up the cypress app and you can choose the specs to run.

npm test