NigelCleland / lode

Scripts to automate getting data from electricity market sources
MIT License
1 stars 1 forks source link

Lode

A single repository for setting up data analytical toolkits for the New Zealand Electricity Market using Python.

Lode handles the Scraping and setting up some sharded Databases as well as providing a simplified query interface for querying relevant data.

Implemented Features

Scrapers:

Databases with query interface:

Features in Planning:

The list of current features being worked on can be found at Github Enhancement List

Problem Statement

The problem at hand:

What we want is to have an up to date data source which plays nice with other sources of data. Furthermore, we want to maintain the integrity of our primary data sources. But we also want to undertake meaningful analysis which answers important questions.

Data Flow Pipeline

So we are left with a pipeline of things which are needed.

  1. Automated Collection of data
    • APIs
    • Scraping
    • Automated Downloads
  2. Manipulation of this data (munging)
    • Cohesive Dates
    • Cohesive Location Identifiers
    • Rich Metadata (for example extended company/technology type information)
    • Simplified Merging
      • E.g. Ability to merge disparate data sources, such as offers and demand together in a simplified format.
      • Want a consistent series of indices. Perhaps DateTime, MarketNode which applies across all of the datasets
  3. Analysis of Data and presentation of the results
    • Simplified formats to accomplish common functions
      • Hydro Risk Curves
      • Price Distributions
      • Offer Curves
      • Fan Curves
      • Demand Distributions
      • Others

Data Sources:

The current proposed Data Sources

Contributing

The above is a broad outline, a wish list so to speak. The ultimate goal is to reduce redundancy especially in the data manipulation and aggregation roles. It is simply crazy that each person is maintaining their own sources of data in a non consistent manner.

Useful Resources

A significant amount of code has already been produced. A key element of each of these pieces of code is that they do not rely upon Databases. Perhaps the next step is to set up persistent databases for each of the sources in order to maintain consistency. For example:

Tessen is a module which simplifies the creation of Fan Curves from generation and reserve offer data.

OfferPandas is a module which handles a significant amount of the pain of working with offer data. It has support for richer metadata as well as working with different formats in a simple manner.

vSPUD is a module to make working with multiple sets of vSPD final output data easier.

Potential Database flow structure

This structure would stop the need to rerun each of the munging steps at each iteration. The caveat is that we are likely to be working on desktop machines which can have limited RAM.

Work To Do:

License

MIT License, acknowledge where the work has come from, send the contributions up stream so that we all benefit. Let me know if it's useful.

Style Guide:

Where appropriate will try to form to PEP8: