dvdstrzl / oem_dpkg

GNU General Public License v3.0
2 stars 0 forks source link
data-package frictionless geodata metadata oem open-energy-family rli

oem_dpkg

OEM Data Package Creation & OEP Upload Handler

This project provides tools for packaging and uploading datasets along with their metadata, specifically tailored for compatibility with the Frictionless Data specifications, Open Energy Metadata (OEM) standards and the Open Energy Platform (OEP).

Features

Contents

Installation

Ensure you have Python 3.8 or newer installed. Clone or download the project repository, navigate to the project directory, and install the required dependencies:

pip install -r requirements.txt

OEM Data Package

The OEMDataPackage class is designed to streamline the creation, validation, and packaging of datasets along with their respective OEM, adhering to the Frictionless Data Package standard and incorporating standards of OEM and OEP. It should facilitate the organization of datasets for easy sharing, publication, and further processing, specifically enabling improved integration with the OEP.

Class Features

Usage

  1. Initialize: Specify the input directory containing datasets and metadata, the output directory for the data package, package name, description, version, and whether to enable OEM integration.
from oem_dpkg import OemDataPackage

package = OemDataPackage(
    input_path="/path/to/datasets",
    output_path="/path/to/output",
    name="Example Data Package",
    description="A comprehensive data package for energy research.",
    version="1.0",
    oem=True
)
  1. Create the Data Package: Call the create() method to automatically package the datasets, perform metadata validation, and prepare the data package.
package.create()

This process copies the datasets and metadata to the specified output directory, validates the metadata against OEM standards, and generates a datapackage.json file that describes the entire data package.

Considerations

OepUploadHandler

The OepUploadHandler class is designed to improve the workflow of uploading data to the Open Energy Platform (OEP). It facilitates the preparation and uploading of datasets and their metadata, ensuring compliance with the Open Energy Metadata (OEM) standards. This guide will explain how to effectively utilize this class, highlighting important considerations to ensure successful data uploads.

Class Features

Prerequisites

Before using OepUploadHandler, ensure you have:

Usage

  1. Initialize: To begin, instantiate the OepUploadHandler class with the path to your data package, your OEP API token, and other relevant information:
from oem_dpkg import OepUploadHandler

upload_handler = OepUploadHandler(
    datapackage_path="path/to/your/datapackage.json",
    api_token="your_oep_api_token",
    oep_username="your_oep_username",
    oep_schema="model_draft",  # Optional, defaults to "model_draft"
    dataset_selection=["dataset1", "dataset2"]  # Optional, specify datasets to upload
)
  1. Extract Dataset Resources: The extract_dataset_resources method filters the datasets you wish to upload based on your dataset_selection. If no selection is provided, all datasets within the data package are processed:
upload_handler.extract_dataset_resources()
  1. Set up OEP Database Connection: Establish a connection to the OEP Database API using setup_db_connection:
upload_handler.setup_db_connection()

This step is crucial for enabling dataset uploads and table creation on the OEP.

  1. Upload Datasets: Use the upload_datasets method to upload the datasets to the OEP. This method handles data preparation, batch uploading, and metadata updating:
upload_handler.upload_datasets()

During the upload process, a progress bar will display the upload status for each dataset.

  1. Update Metadata:

If you need to update the metadata for a dataset already on the OEP, use the update_oep_metadata method. Provide the path to the OEM file and the table name:

upload_handler.update_oep_metadata(
    oem_path="path/to/your/oem_file.json",
    table_name="your_table_name"
)

Or... run all!

With "run_all()" you execute the complete upload process (initializing the handler, extracting resources, setting up a database connection, creating necessary tables, uploading metadata and dataset-data).

oep_uploadhandler = OepUploadHandler(
    datapackage_path="output/LATEST/datapackage",
    api_token="your_api_token",
    oep_username="your_oep_username"
)
oep_uploadhandler.run_all()

Considerations


CLI

The provided CLI tool (cli.py) offers an accessible way to use the functionalities of this project from the command line, streamlining the process of data package creation and uploading to OEP.

Creating a Data Package

To create a data package from your datasets and metadata, run:

oem_dpkg create-package <input_path> <output_path> <name> <description> <version> [--oem]

Uploading to OEP

To upload your prepared data package to the Open Energy Platform, use:

oem_dpkg oep-upload <datapackage_path> --dataset_selection <dataset_name> --schema <schema_name>

The dataset_selection argument is optional; if not provided, all datasets within the data package will be processed.

Example calls

oem_dpkg create-package "input/path" "output/path" "name" "description" "version" --oem

oem_dpkg oep-upload "output/path/datapackage/datapackage.json" --dataset_selection "dataset1" --dataset_selection "dataset2" --schema "model_draft"

Open Energy Family

For a variety of functions, this package builds on already existing methods and tools that are a part of the 'Open Energy Family':

oem2orm

omi

oemetadata

oep_client

oedialect