OpenEnergyPlatform / open-MaStR

A collaborative software to download the energy database Marktstammdatenregister (MaStR)
https://open-mastr.readthedocs.io/en/latest/
GNU Affero General Public License v3.0
87 stars 19 forks source link

BNetzA-MaStR #4

Closed Ludee closed 2 years ago

Ludee commented 5 years ago

Processed data set: Bundesnetzagentur - Marktstammdatenregister

Link to metadata: supply.bnetza_mastr.sql Link to API-Infos: https://github.com/OpenEnergyPlatform/open-MaStR/issues/10

Link to Marktstammdatenregister: www.marktstammdatenregister.de Link to Documentation: www.marktstammdatenregister.de Link to API Documentation: www.marktstammdatenregister.de WSDL: .../MaStRAPI/wsdl/mastr.wsdl Data license: https://www.govdata.de/dl-de/by-2-0 Data Download: reiner-lemoine-institut.de

status (number of units / number of units with geolocation)

version date unit wind hydro biomass solar
v1.1 2019-03-01 1862005 30566 / 12099 7453 / 1725 18337 / 11366 804365 / 52147 (x1)
v1.2 2019-03-03 1884005 31312 / 13166 7671 / 1845 22873 / 15656 1799323 / 103423 (x1)
v1.3 2019-04-12 1965200 32920 / 14337 8160 / 2166 25369 / 17888 1861631 / 98152 (x1)
v1.4 2019-06-11 2110197 35540 / 16413 8602 / 2472 26991 / 19214 tba / tba
v1.5 tba tba tba / tba tba / tba tba / tba tba / tba

(x1) not included in the release

Ludee commented 5 years ago

The API goes offline at 00:00 every night. :waxing_gibbous_moon:

nesnoj commented 5 years ago

Argh, wondering..thanks for the info. Due to the DOS? When is it going to be back online?

Ludee commented 5 years ago

I think it's a normal cronjob. It takes about an hour. But I found nothing about it in the documentation. I will collect such infos in this issue.

I finished the first important part: All "Stromerzeuger" in one list. 1 799 161 Einheiten. 280 MB as CSV.

At the moment I iterate over all wind units. The first version of a "wind power plant list" will be ready today soon.

Ludee commented 5 years ago

I extracted 31 243 wind units. The iteration stops at around 3 000. Allgemeiner Fehler

~~It's like I DDOS the API. So I included a break in the loop. time.sleep(1) So the entire list will take around 8 hours. And again for the Wind EEG data. It's not possible to pass a list with IDs to the function.~~

Kind of stupid solution. Solved the issue with try/except for each download to prevent the entire script from failing if one entry is not working. Still needs about one hour for wind.

I will clean up the code a little bit and then add it to this repo. The challenge is bigger than expected.

UPDATE: fixed "Fehler"

Ludee commented 5 years ago

:turtle: :racehorse: :rocket: I got a complete wind data set! :tada:

See the first data validation below. Some data cleaning will be needed, though. First release will be in the next days.

bnetza_mastr_0 7_wind_01 bnetza_mastr_0 7_wind_02

nesnoj commented 5 years ago

That's awesome! I'm curious if the locations are somewhat distorted or the data is incomplete since in your map there're locations without WECs where there're supposed to be some. I'll have a closer look on your data later on..

Ludee commented 5 years ago

You are right. The join didn't work correctly in this attempt. There are only 8912 points of 29886 on the map. The web interface filters 29746 entries, by the way. I downloaded 20974 wind turbines with complete data. Here are some quick plots.

bnetza_mastr_0 8_wind_stats

Ludee commented 5 years ago

The current version (v0.10) downloaded 29369 power_unit and found 29032 unit_wind (with location and full parameters). ~300 missing units ~100 units are located outside of Germany and must be data errors.

I'm preparing the metadata to have a proper datapackage (OPSD style). Then I have a look at the other technologies.

FYI: The scripts took about 5 hours to download and process.

nesnoj commented 5 years ago

@Ludee I had a quick look at the current v1.0 of wind data. At a first glance, the locations are plausible. I picked out a specific region (Anhalt):

abw_mastr-osm_wec_data1

The MaStR contains about 17% (red) of the (OSM) WEC in this region (blue). I checked some WEC in my region and the Addresses seem to be correct. Also, the MaStR points in the region match those from OSM. Thus, the projection (EPSG:31467) I used seems to be correct. As the overall count of WEC for Germany seem to match too, the MaStR coords are (at least) partly erroneous or not mapped?

Similar results for a different region (Uckermark), (same colors, OSM: blue, MaStR: red):

um_mastr-osm_wec_data1

Ideas?

By the way:

The SRID information should be included in the README.md/datapackage.json

Ludee commented 5 years ago

I already expected that a lot of WEC don't have correct locations yet. I used the columns "Laengen- und Breitengrad" in EPSG:4326 (WGS84) in QGIS. I will add that to the metadata. As you see, the resource description are still missing.

UPDATE: numbers with geom: 11657 no geom: 18381

christian-rli commented 5 years ago

I'm not sure exactly where your problem lies @nesnoj , but EPSG:4647 (UTM Zone 32N) should be correct for your region. If your data show up correctly in one projection, but not in the other, something has gone wrong during conversion. It's either that, or you're displaying data from from two different reference systems in one. if all data are contained in a box after reprojection and they haven't before, something else entirely has gone wrong.

christian-rli commented 5 years ago

I think at least part of the misplaced data might be explained by the confusion between EPSG:5652 and EPSG:4647 which are basically the same, only with N and E values swapped. I tried that for a random point in Berlin and ended up somewhere close to the southern coast of the Arabian Peninsula - which could explain the cluster of locations southeast of Germany in the screenshot posted by @Ludee .

Ludee commented 5 years ago

Breaking news

christian-rli commented 5 years ago

Breaking news

\o/ I like the hand-edited photo :)

nesnoj commented 5 years ago

Good work, thank you!

Ludee commented 5 years ago

Version 1.3 finished. But I had to restart solar due to connection errors. I will prepare the data package for release in the next days.

Ludee commented 5 years ago

The data version 1.4 with some new feature is on the way. Will be released within the next days...

aelbouha commented 5 years ago

Hello everyone,

I am looking to use the Mastr API from within my company. For security reasons, I have to redirect all the traffic via the company's PROXY. Is anyone familiar with this kind of issues?

Ludee commented 5 years ago

Hello @aelbouha, I'm not familiar with this. But please share your solution if you are able to solve it. Can you tell the company you are working at!? Just currious.

aelbouha commented 5 years ago

Hello @Ludee,

Bellow the fix to my problem. I should have set the value of the environment variables $HTTP_PROXY and $HTTPS_PROXY and authenticate to the proxy before executing my code.

Btw I work for the French TSO, RTE. I'm interested in getting the same information as in the EEG register, in the same '.csv' format

` import requests from requests.auth import HTTPBasicAuth from zeep import Client, Settings from zeep.cache import SqliteCache from zeep.transports import Transport import os

username = password =

proxies = { 'http': :' , 'https': :', } os.environ["HTTP_PROXY"] = proxies['http'] os.environ["HTTPS_PROXY"] = proxies['https']

session = requests.Session() session.verify = False auth = HTTPBasicAuth(username, password)

wsdl = 'https://www.marktstammdatenregister.de/MaStRAPI/wsdl/mastr.wsdl' setting = Settings(strict=False, xml_huge_tree=True)

client = Client(wsdl=wsdl, transport=Transport(session=session, cache=SqliteCache()), settings=setting)

`

Ludee commented 5 years ago

I'm interested in getting the same information as in the EEG register, in the same '.csv' format

Which EEG register do you refer to? What do you mean by same .csv format?

I will publish a monthly version on our NextCloud. Announcing it here and on Twitter Feel free to use this code. If you have any questions, don't hesitate to contact me.

aelbouha commented 5 years ago

@Ludee before January 2019, the same kind of data available via the Mastr web service was published in excel files (and not csv files), excuse my imprecision. I attached an example below. These files were easily downloadable on the Bnetza web page and this what I call the EEG register.

2019_01_Veroeff_RegDaten.xlsx

Ludee commented 5 years ago

@aelbouha Thanks for the clarification. The old "EEG Anlagenregister" is discontinued. They were transferred and can be identified by the first letters of the MaStR-ID. "SME.." = old, "SEE.." = new entries. The new "MaStR" has a different data structure with (a lot) of new columns and different column names. A "download all" button was planed by BNetzA but is not available yet.

The software I wrote is downloading different parts and joins them in one large table (CSV) for each technology. You can have a look at the structure in the data package linked in the first post.

I'm currently working on the data cleansing. It's still a long way until we have a decent version to work with. But it is worth the effort. Further comments are always welcome.

Ludee commented 5 years ago

An official release is now available! https://www.bundesnetzagentur.de/DE/Sachgebiete/ElektrizitaetundGas/Unternehmen_Institutionen/MaStR/MaStR_node.html#doc514816bodyText7

At first glance: The data comes in 3 files with separate tables for the different parts (ENH, EEG, KWK, SGE, MAK). The technologies are mixed together. Solar data is missing.

aelbouha commented 5 years ago

Hello @Ludee!

I have a quick question for you. How to make sure that you downloaded, via the API, all the units available up-to-date?

ghost commented 5 years ago

Hi @Ludee, As there is no GetEinheitStromerErzeuger method on the API, how can you scrape all the data of Stromerzeuger?

Also, what is the quota to request per day? I read somewhere that it's 10000 calls per day?

Ludee commented 5 years ago

Hi @haantran96, I use the function GetGefilterteListeStromErzeuger. With a loop over it because it can only return 2000 entries.

The API has a technical limit (and a counter), but no need to worry about it. Run GetAktuellerStandTageskontingentRequest = 2 147 483 647

ghost commented 5 years ago

Have you encountered the "Fault: Zugriff verweigert" error? My codes were running fine for the past week but my access is denied.

Ludee commented 5 years ago

No, not yet.

BenPortner commented 5 years ago

I wrote a notebook that does some post-processing:

The notebook can be found here: https://github.com/BenPortner/data-preprocessing/blob/master/data-import/bnetza_mastr/jupyter/OEP_MaStR_cleansing.ipynb

The interative map is found here (may take a while to load!): https://nbviewer.jupyter.org/github/BenPortner/data-preprocessing/blob/master/data-import/bnetza_mastr/jupyter/all_located.html

There are still a few wrongly located entries among the filtered data. They all belong to the Nordseecluster. It should be easy to filter those by hand.

Hope it's helpful!

Ludee commented 5 years ago

Hi @BenPortner, thank you very much for this contribution. Your cleansing and plotting looks really good!

I did some postprocessing (in SQL) last weeks. Unfort. I didn't provided it yet. I relocated wrong or missing coordinates to the centre of the PLZ. Let's have a chat next week of how to combine our efforts!?

We are currently preparing the next data release including PV!

Ludee commented 5 years ago

We are also restructuring the repo. The code and issues will be transfered.

LosWochos76 commented 5 years ago

Processed data set: Bundesnetzagentur - Marktstammdatenregister

  • 2019-01-31 MaStR is online
  • 2019-02-07 Start review
  • 2019-03-01 Inofficial release data v1.1

Link to metadata: supply.bnetza_mastr.sql Link to API tutorial: BNetzA_MaStR_API.ipynb

Link to Marktstammdatenregister: www.marktstammdatenregister.de Link to Documentation: www.marktstammdatenregister.de Link to API Documentation: www.marktstammdatenregister.de WSDL: .../MaStRAPI/wsdl/mastr.wsdl Data license: https://www.govdata.de/dl-de/by-2-0 Data Download: reiner-lemoine-institut.de

status (number of units / number of units with geolocation)

version date unit wind hydro biomass solar v1.1 2019-03-01 1862005 30566 / 12099 7453 / 1725 18337 / 11366 804365 / 52147 (x1) v1.2 2019-03-03 1884005 31312 / 13166 7671 / 1845 22873 / 15656 1799323 / 103423 (x1) v1.3 2019-04-12 1965200 32920 / 14337 8160 / 2166 25369 / 17888 1861631 / 98152 (x1) v1.4 2019-06-11 2110197 35540 / 16413 8602 / 2472 26991 / 19214 tba / tba v1.5 tba tba tba / tba tba / tba tba / tba tba / tba (x1) not included in the release

The Link to the API tutorial seems to be dead. I would really like to see how to acces the MASTR with python. I did not manage to do this on my own, unfortunately. So, help is appreciated.