Open lkstrp opened 3 months ago
Hi @lkstrp and other devs from powerplantmatching
, I'm one of the developers of open-mastr
. I like your work in harmonizing different sources for one european dataset. If there are issues from your side that are of concern for the open-mastr
development, I'm happy to discuss them.
One remark on your comment above: "We could use the API instead, but then the user has to pass a token." This is not really a good idea. With the API you are limited to a small number of requests per day, so using it to get large data takes a long time. You could however run the bulk download to get an sqlite or postgres database and extract relevant information from there.
from open_mastr import Mastr
db = Mastr()
db.download()
# if you want csv files then also run
db.to_csv()
Hey @FlorianK13, Thanks for reaching out!
So far the idea was to basically just use the zenodo download you provide, which is quite time consuming to download.
from open_mastr import Mastr db = Mastr() db.download() # if you want csv files then also run db.to_csv()
Does this approach have any advantages over the zenodo download? E.g. runs faster, allows downloading only selected data? The API reference reads like it downloads the same zip in bulk, but allows data selection. Which means it downloads everything and just strips away unselected data?
When using the python download method, you will get the most recent data (from the day before). On zenodo you will get the data from our last update, which is a few month old. However with zenodo your code is reproducible, as the python download changes every day as the dataset from BNetzA changes every day. To achieve reproducibilty with python, you would need to specify date="existing"
(Reference) after you have downloaded the dataset once so that you use your existing local dataset from there on.
Both approaches take rather long, as you need to download the whole dataset. Afterwards you can specify which data you are interested to parse. So you are right with your last sentence 'Which means it downloads everything and just strips away unselected data.'
open-mastr provides a bulk download of all the cleaned datasets on zenodo. But as a .zip, so we have to download everything. We could use the API instead, but then the user has to pass a token.
Based on the discussion above, let's take the zenodo releases. If that's updated at least on an annual basis, that's fine. I am also not too worried about the large download size, as it is usually not a frequent action to update it and it's cached locally as well. @FlorianK13, it could be an option for upcoming releases to upload the individual CSV files unzipped into the zenodo repository, which would allow selective downloads (even though you lose the ZIP compression). This could be additional to the ZIP.
These datasets are huge with many small power plants. I have now filtered out all plants with a capacity of less than 1 MW. Otherwise powerplant.aggregate_units() takes too long. Solar and wind are also currently not included.
Yes, that's also what Global Energy Monitor does. Perhaps they will also integrate open-MaStR, then we wouldn't have to.
Validation is not done yet, I wait for the ENTSOE token to run compare-with-entsoe-stats.py, but below is a first plot
I got one on the same day I requested it today.
@fneum I created https://github.com/OpenEnergyPlatform/open-MaStR/issues/558 to discuss if we can upload single files at zenodo.
Closes #16
Change proposed in this Pull Request
Adds Marktstammdatenregister via open-MaStR.
There are a few issues:
.zip
, so we have to download everything. We could use the API instead, but then the user has to pass a token.powerplant.aggregate_units()
takes too long. Solar and wind are also currently not included.compare-with-entsoe-stats.py
, but below is a first plotType of change
Checklist
doc/release_notes.rst
.pre-commit run --all
to lint/format/check my contributiondoc/
.