environmental-forecasting / preprocess-toolbox

A toolbox for processing downloaded datasets according to common approaches for environmental data
MIT License
0 stars 0 forks source link

Preprocessing Toolbox

GitHub issues GitHub closed issues GitHub GitHub forks GitHub forks

This is the preprocessing library for taking download-toolbox datasets and combining / composing multi-source data loaders that can be used to cache or supply downstream applications.

This is only just getting started, more info will appear soon.

Contact jambyr <at> bas <dot> ac <dot> uk if you want further information.

Table of contents

Installation

Not currently released to pip.

Please refer to the contribution guidelines for more information.

Implementation

When installed, the library will provide a series of CLI commands. Please use the --help switch for more initial information, or the documentation.

Basic principles

The library provides the ability to preprocess download-toolbox datasets and create singular configurations for reading out the data in a multi-channel format for dataset construction:

  1. Preprocess datasets from download-toolbox so that the dataset is continuous and normalised for the downstream application
  2. Generate a loader configuration, applying additional metadata (arbitrary channels and masks) providing initial access to the collected data
  3. Use this data loader to produce usable application datasets for downstream applications (testing with IceNet and another internal application)

This is a base library upon which application specific processing is based, lowering the implementation overhead for creating multi-source datasets for environmental applications that require integration of data from sources that download-toolbox provide.

This library doesn't have knowledge of those datasets, it forms the basis for processing things specific to an application by importing application-specific logic dynamically. See this issue for a quick idea of how this works with the IceNet workflow.

Limitations

There are some major limitations to this as a general purpose tool, these will hopefully be dealt with in time! I'm raising issues as I go

This is currently very heavy development functionality, but the following commands already work:

Other stubs probably don't work, unless I forgot to update these docs!

Contributing

Please refer to the contribution guidelines for more information.

Credits

License

This is licensed using the MIT License