PyPSA / powerplantmatching

Set of tools to combine multiple power plant databases
https://powerplantmatching.readthedocs.io/en/latest/
GNU General Public License v3.0
152 stars 52 forks source link

Updating the default dataset may have double counting in Hard Coal #186

Closed irm-codebase closed 3 weeks ago

irm-codebase commented 1 month ago

During some testing, I re-ran the default configuration:

plants = ppm.powerplants(update=True)

The resulting dataset is similar to the documentation, except for a big increase in Hard Coal facilities:

image

The seems like the statistics are from 2018(*?), but I doubt that Europe had such a massive increase in coal facilities in the last half decade... Maybe some double counting is happening due to changes in one of the source datasets?

fneum commented 3 weeks ago

With Global Energy Monitor, we have an increasing amount of retired power plants in the powerplants.csv, which is useful for reproducing historical market outcomes in 2020 or other years. I think this is where the discrepancy comes from.

To get capacities for 2022, apply some filters:

import powerplantmatching as pm
powerplants = pm.powerplants()
query = "(DateOut > 2022 or DateOut != DateOut) and (DateIn < 2023 or DateIn != DateIn)"
powerplants.query(query)

There will also be some trimmings in #163 with very good alignment with country-level statistics (rooftop solar and utility-scale below 1MW not included):

image

Stats have been created with the ENTSO-E Python API.

from entsoe import EntsoePandasClient
from powerplantmatching.cleaning import gather_fueltype_info
import pandas as pd

client = EntsoePandasClient(api_key=ENTSOE_TOKEN)

start = pd.Timestamp("20220101", tz="Europe/Berlin")
end = pd.Timestamp("20230101", tz="Europe/Berlin")

kwargs = dict(start=start, end=end, psr_type=None)

def parse(c):
    rename = {"GB": "UK"}
    for n in range(2):
        try:
            print(c, n)
            return client.query_installed_generation_capacity(rename.get(c, c), **kwargs).iloc[0]
        except Exception as e:
            print(f"Country {c} failed with {e}")
            time.sleep(3)
    return np.nan

stats = pd.DataFrame({c: parse(c) for c in powerplants.Country.unique()})
fueltypes = gather_fueltype_info(pd.DataFrame({"Fueltype": stats.index}), ["Fueltype"])
stats = stats.groupby(fueltypes.Fueltype.values).sum().unstack()

# https://de.wikipedia.org/wiki/Liste_von_Wasserkraftwerken_in_der_Schweiz?oldformat=true
stats.loc["CH", "Hydro"] = 17038