Adds all latest GEM datasets.

Tomkourou commented 1 year ago

Closes #80 and closes #81.

Change proposed in this Pull Request

Adds all the latest datasets from GEM. Including Geothermal, Wind, Solar, Gas & Coal.

Type of change

[ ] Bug fix (non-breaking change which fixes an issue)
[x] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to change)
[ ] Other

Checklist

[x] I have added a note to release notes doc/release_notes.rst.
[x] I have used pre-commit run --all to lint/format/check my contribution
[x] I have documented the effects of my code changes in the documentation doc/.
[x] I have adjusted the docstrings in the code appropriately.

Tomkourou commented 1 year ago

Whilst updating GCPT I noticed that the contributing.rst has wrong information. It suggests running the following function as the last step:

pm.powerplants(update_all=True)

However this now does nothing as update_all has been deprecated. As seen below.

WARNING:powerplantmatching.collection:The following arguments were deprecated and are being ignored: {'update_all'}

@FabianHofmann what do you think about this? Would be good to update the contributing.rst doc.

FabianHofmann commented 1 year ago

Hey @Tomkourou, indeed we should fix this. It should be update = True. Would you be so kind and make a PR?

FabianHofmann commented 1 year ago

And thanks btw. Let me know when this is ready to review :)

Tomkourou commented 1 year ago

And thanks btw. Let me know when this is ready to review :)

Ready to go :)

Tomkourou commented 1 year ago

Really cool! :) Before delving into code optimization some comments in this review.

Additional to the comments, do you think it might make sense to report how the powerplant capacity per carrier changed? Like comparing pre-contribution capacity per carrier vs after contribution.

Here is the analysis with the standard European target_country list: new dataset: ENTSOE, GEO, GPD, JRC, OPSD, BEYONDCOAL, WIKIPEDIA, GGPT, GCPT, GCPT, GGtPT, GSPT, GWPT

original dataset: ENTSOE, GEO, GPD, JRC, OPSD, BEYONDCOAL, WIKIPEDIA, GGPT

Technology	new dataset	original dataset	Percentage difference
Bioenergy	10841.6834	6067.6834	78.7%
Geothermal	806	766	5.2 %
Hard Coal	183210.371796	116945.593026	56.7%
Hydro	214002.872826	214054.872826	0.0%
Lignite	77853.297197	56778.921	37.1%
Natural Gas	219722.632261	230497.597932	-4.7%
Nuclear	138763.1	138391.798212	0.3%
Oil	18454.470346	20862.942634	-11.5%
Other	3728.809314	3695.380957	0.9%
Solar	25663.803	3778.903	579.1%
Waste	2950.606334	2950.606334	0.0%
Wind	208877.81	24401.58	756.0%

(updated with Nuclear and Biomass)

pz-max commented 1 year ago

Hi @Tomkourou, The code looks good. I just added nuclear and bioenergy here: https://github.com/Tomkourou/powerplantmatching/pull/1. (Please merge the PR if you think it is ok, so it will appear here too). As minor detail, I changed GGtPT to GGTPT for naming consistency. Most of our data names are by default in capital letter. Any dev will think here its a bug even though its named like GGtPT officially by GEM.

@Tomkourou , could you update your above plot with the new nuclear and bioenergy data?

@FabianHofmann, in case you start reviewing, one minor remaining question. Given a config.yaml entry like:

GSPT:
  net_capacity: false
  reliability_score: 4
  fn: Global-Solar-Power-Tracker-January-2023.csv
  url: https://raw.githubusercontent.com/pz-max/gem-powerplant-data/main/Global-Solar-Power-Tracker-January-2023.csv

Should we add something like aggregated_units: true? (some powerplants like nuclear have the same name for different powerplant units. Does that lead to conflicts? Will aggregated_units: true help?). Probably we could test that @Tomkourou :+1:

Tomkourou commented 1 year ago

Hi @Tomkourou, The code looks good. I just added nuclear and bioenergy here: https://github.com/Tomkourou/powerplantmatching/pull/1. (Please merge the PR if you think it is ok, so it will appear here too). As minor detail, I changed GGtPT to GGTPT for naming consistency. Most of our data names are by default in capital letter. Any dev will think here its a bug even though its named like GGtPT officially by GEM.

@Tomkourou , could you update your above plot with the new nuclear and bioenergy data?

@FabianHofmann, in case you start reviewing, one minor remaining question. Given a config.yaml entry like:

GSPT:
  net_capacity: false
  reliability_score: 4
  fn: Global-Solar-Power-Tracker-January-2023.csv
  url: https://raw.githubusercontent.com/pz-max/gem-powerplant-data/main/Global-Solar-Power-Tracker-January-2023.csv

Should we add something like aggregated_units: true? (some powerplants like nuclear have the same name for different powerplant units. Does that lead to conflicts? Will aggregated_units: true help?). Probably we could test that @Tomkourou :+1:

That's a good question. Until now I didn't have any problems with this but that was because I was using the GEM unit ids or phase ids for renewables instead of project ids. If you use projectids it will probably throw an error since there will be duplicates.

pz-max commented 1 year ago

@Tomkourou Will be reviewed on Monday :+1:

FabianHofmann commented 1 year ago

thanks @Tomkourou! looks all very nice!

FabianHofmann commented 1 year ago

@Tomkourou one question: You added that the GEM datasets are not reporting net capacity. Do you remember where you got this information from? I cannot find it on the GEM page. Since PPM should report net capacity, this might lead to some issues if we do not rescale the capacities. Thanks!

Tomkourou commented 1 year ago

@FabianHofmann it's a good question. My understanding was that since most GEM datasets include multiple units for powerplants then they needed to be summed up in the process and that's why I categorised them as not reporting net capacity.

Perhaps I misunderstood the meaning of net capacity though and it needs to be recategorised?

FabianHofmann commented 1 year ago

@Tomkourou thanks for your answer! Actually, the correct tag for that would be aggregated_units. But this is no problem at all, since the data is effectively treated as net capacity in the code, which then correct and makes things easier. I will just remove the net_capacity tags from the config.

PyPSA / powerplantmatching