GermanZero-de / localzero-data-public

Intern data storage for gz package
0 stars 2 forks source link

Suggestion for pulling data on renewable energy from the Marktstammdatenregister #29

Open curiousleo opened 2 years ago

curiousleo commented 2 years ago

The commit messages on this repo are in English, so I'm writing this issue in English as well.

Problem: renewable energy statistics

This repo contains a summary of renewable energy data organised by Gemeindeschlüssel in CSV format:

https://github.com/GermanZero-de/localzero-data-public/blob/main/renewable_energy/2018.csv

The README explains how this data was gathered:

This data originates in the "Marktstammdatenregister" [...] This summation and bucketing to all ags keys was done in excel and needs to be repeated in python soon as the we choose an ags ist for 2021 instead of 2018 (We will provide a script as soon as this is done.).

It seems like both the initial data gathering and the summation potentially involved a lot of work and may have been time consuming. It also sounds like there is an effort underway to automate the procedure.

Context: Marktstammdatenregister.dev

I created Marktstammdatenregister.dev to make it easy to query the Marktstammdatenregister. Summary statistics of the kind used in this repo are exactly the project's raison d'être.

Here's a preview of the relevant query that presents the summary statistics in the format used in the existing CSV file.

You'll see that it is missing statistics for solar power. I removed that part because the full query times out on the low-powered machine that runs the website.

However, almost any recent computer can give the full result quickly using the underlying database file, see the next section.

Idea: use SQLite export directly

In short (using localzero.sql):

# Download and decompress the SQLite export.
# Takes less than 30 seconds with a fast internet connection.
# Marktstammdatenregister.db.gz: ~800 MB
# Marktstammdatenregister.db:    ~2.5 GB
curl https://s3.eu-central-1.wasabisys.com/mastr-backup/Marktstammdatenregister.db.gz \
  | gunzip - >Marktstammdatenregister.db

# Generate the CSV file.
# Takes less than five seconds with an SSD and recent CPU.
curl -sSLo localzero.sql https://gist.githubusercontent.com/curiousleo/7940a6b03a3e0ccf7ad89d6334d678e8/raw/e77d9f8b4622c4e645ea00ce1a8887a90ff26f92/localzero.sql
sqlite3 --readonly --csv --header \
  Marktstammdatenregister.db \
  <localzero.sql \
  >2018.csv

The database file that can be queried on https://ds.markstammdatenregister.dev can also be downloaded for local use. That's what the above snippet does.

We can then run the full query, including solar power, locally. With the right command-line arguments, SQLite outputs a CSV file with the format used in the existing file 2018.csv.

To calculate the statistics for any other date, just edit the first line of localzero.sql, for example:

-with Stichtag as (select "2018-12-31" as Stichtag),
+with Stichtag as (select "2021-12-31" as Stichtag),

Note: results change over time

Many renewable energy units are registered after they've started operating, sometimes years later -- see this query.

So even a historical query like "how much solar power was generated in Buxtehude on 2018-12-31" may give different results depending on which version of the Markstammdatenregister is used.

Conclusion

This is what I built Markstammdatenregister.dev for. I think there's a good chance that it can save you time and effort.

Feel free to try the suggestions and to have a look around the data and the examples.

If you have any questions, I'm happy to help. It is also definitely possible that I made a mistake in writing the SQL query or misunderstood something about the summary statistics needed for LocalZero -- if you spot anything that looks off, let me know!

bgrundmann commented 2 years ago

Very cool. Exciting to see that others are working on this. Thanks. I myself are taking this week of to recharge after the sprint to make the release. @Ultramarin95 do you want to have a look?

Ultramarin95 commented 2 years ago

Witzigerweise habe ich genau den Export aus dem Marktstammdaten Register grade diese Woche geschrieben. ( Siehe https://github.com/GermanZero-de/localzero-generator-core/pull/200 ) Ich werde mal die beiden "2018.csv" Versionen vergleichen und mir das Tool etwas genauer anschauen @curiousleo.

curiousleo commented 2 years ago

@bgrundmann said:

Very cool. Exciting to see that others are working on this.

Same here! Thanks for your work on LocalZero!

curiousleo commented 2 years ago

@Ultramarin95 said:

Witzigerweise habe ich genau den Export aus dem Marktstammdaten Register grade diese Woche geschrieben.

Nice. Ich habe entsprechend dem verlinkten Skript die SQL-Abfrage aktualisiert (Brutto -> Nettoleistung, Summe als "DG000000").

In dem Skript geht es viel um AGS, die auch aus einer externen Quelle geholt werden -- den Teil verstehe ich nicht ganz, deshalb habe ich, was das angeht, erstmal nichts an der vorgeschlagenen SQL-Abfrage geändert.

curiousleo commented 2 years ago

An der Stelle sollte https://github.com/OpenEnergyPlatform/open-MaStR nicht unerwähnt bleiben. Ich habe es selbst aber noch nicht verwendet.