PyPSA / powerplantmatching

Set of tools to combine multiple power plant databases
https://powerplantmatching.readthedocs.io/en/latest/
GNU General Public License v3.0
152 stars 52 forks source link

Add Marktstammdatenregister (MaStR) #165

Open lkstrp opened 3 months ago

lkstrp commented 3 months ago

Closes #16

Change proposed in this Pull Request

Adds Marktstammdatenregister via open-MaStR.

There are a few issues:

Dataset File Name Number of entrys Entrys with less than 1 MW capacity
_biomass.csv 22284 21240 (95.32%)
_combustion.csv 85424 81776 (95.73%)
_nuclear.csv 6 0 (0.00%)
_hydro.csv 8657 7859 (90.78%)
_wind.csv 34798 6729 (19.34%)

output

Type of change

Checklist

FlorianK13 commented 1 month ago

Hi @lkstrp and other devs from powerplantmatching, I'm one of the developers of open-mastr. I like your work in harmonizing different sources for one european dataset. If there are issues from your side that are of concern for the open-mastr development, I'm happy to discuss them.

One remark on your comment above: "We could use the API instead, but then the user has to pass a token." This is not really a good idea. With the API you are limited to a small number of requests per day, so using it to get large data takes a long time. You could however run the bulk download to get an sqlite or postgres database and extract relevant information from there.

from open_mastr import Mastr

db = Mastr()
db.download()
# if you want csv files then also run
db.to_csv()
lkstrp commented 1 month ago

Hey @FlorianK13, Thanks for reaching out!

So far the idea was to basically just use the zenodo download you provide, which is quite time consuming to download.

from open_mastr import Mastr

db = Mastr()
db.download()
# if you want csv files then also run
db.to_csv()

Does this approach have any advantages over the zenodo download? E.g. runs faster, allows downloading only selected data? The API reference reads like it downloads the same zip in bulk, but allows data selection. Which means it downloads everything and just strips away unselected data?

FlorianK13 commented 1 month ago

When using the python download method, you will get the most recent data (from the day before). On zenodo you will get the data from our last update, which is a few month old. However with zenodo your code is reproducible, as the python download changes every day as the dataset from BNetzA changes every day. To achieve reproducibilty with python, you would need to specify date="existing" (Reference) after you have downloaded the dataset once so that you use your existing local dataset from there on.

Both approaches take rather long, as you need to download the whole dataset. Afterwards you can specify which data you are interested to parse. So you are right with your last sentence 'Which means it downloads everything and just strips away unselected data.'

fneum commented 3 weeks ago

open-mastr provides a bulk download of all the cleaned datasets on zenodo. But as a .zip, so we have to download everything. We could use the API instead, but then the user has to pass a token.

Based on the discussion above, let's take the zenodo releases. If that's updated at least on an annual basis, that's fine. I am also not too worried about the large download size, as it is usually not a frequent action to update it and it's cached locally as well. @FlorianK13, it could be an option for upcoming releases to upload the individual CSV files unzipped into the zenodo repository, which would allow selective downloads (even though you lose the ZIP compression). This could be additional to the ZIP.

These datasets are huge with many small power plants. I have now filtered out all plants with a capacity of less than 1 MW. Otherwise powerplant.aggregate_units() takes too long. Solar and wind are also currently not included.

Yes, that's also what Global Energy Monitor does. Perhaps they will also integrate open-MaStR, then we wouldn't have to.

Validation is not done yet, I wait for the ENTSOE token to run compare-with-entsoe-stats.py, but below is a first plot

I got one on the same day I requested it today.

FlorianK13 commented 2 weeks ago

@fneum I created https://github.com/OpenEnergyPlatform/open-MaStR/issues/558 to discuss if we can upload single files at zenodo.