Welcome to MDIO - a descriptive format for energy data that is intended to reduce storage costs, improve the efficiency of I/O and make energy data and workflows understandable and reproducible.
MDIO schema definitions here.
First clone the MDIO v1.0 library:
This project uses CMake for the build and requires CMake 3.24 or better to build. The project build is configured to use the fetch and install it 3rd party dependencies. To build MDIO, clone the repos and create a build directory:
$ mkdir build
$ cd build
# NOTE: "CMake Deprecation Warning at build/_deps/nlohmann_json_schema_validator-src/CMakeLists.txt:1" can safely be ignored
$ cmake ..
Each MDIO target has the prefix "mdio" in its name, to build the tests run the following commands from the build directory:
$ make -j32 mdio_acceptance_test
The acceptance test will validate that the MDIO/C++ data can be read by Python's Xarray. To ensure that the test passes, make sure your Python environment has Xarray install, and run the acceptance test:
$ cd build/mdio/
$ ./mdio_acceptance_test
The dataset and variables have their own test suite too:
$ make -j32 mdio_variable_test
$ make -j32 mdio_dataset_test
Each MDIO library will provide an associated cmake alias, e.g. mdio::mdio which can be use to link against MDIO in your project.
MDIO API documentation is currently provided with the MDIO library.
open mdio/docs/html/index.html
Standardized Schema Compliance: MDIO enforces a strict adherence to a standardized schema for all data inputs, ensuring consistency, reliability, and ease of data interoperability.
Cloud and On-Premise Storage: MDIO is intended to efficiently support energy datasets for local filesystems and HPC, and cloud object stores. Currently MDIO supports cloud storage with GCS and S3.
Xarray and Python MDIO Compatibility: We prioritize compatibility with popular data analysis tools like Xarray and Python MDIO, allowing for straightforward integration with your existing workflows.
High Scalability and Performance: Scalable asynchronous and concurrent I/O and tensor operations to handle complex and large energy datasets with ease, ensuring that your data processing remains fast and efficient, even as your data grows.
Our vision is to provide a tool that not only simplifies the management of energy data but also enhances the quality and depth of energy analysis. By keeping units, dimensions, and other critical metadata with the data, MDIO ensures that every dataset is not just a collection of numbers but a rich, self-explaining narrative of energy insights.
MDIO is built for a wide range of users, including:
We use the tensorstore library to provide native a C/C++ interface to ZArr. If you're familiar with the Python DASK library, tensorstore has very similar semantics when it comes to manipulating data and creating asynchronous execution.
Tensorstore is used under an Apache 2.0 license.
Relevant features of the Tensorstore library are:
Nice to have features of Tensorstore:
We use the json-schema-validator library to validate MDIO schemas against the schema definitions.
This library is used under the MIT license.