Updating the default dataset may have double counting in Hard Coal

With Global Energy Monitor, we have an increasing amount of retired power plants in the powerplants.csv, which is useful for reproducing historical market outcomes in 2020 or other years. I think this is where the discrepancy comes from.

To get capacities for 2022, apply some filters:

import powerplantmatching as pm
powerplants = pm.powerplants()
query = "(DateOut > 2022 or DateOut != DateOut) and (DateIn < 2023 or DateIn != DateIn)"
powerplants.query(query)

There will also be some trimmings in #163 with very good alignment with country-level statistics (rooftop solar and utility-scale below 1MW not included):

Stats have been created with the ENTSO-E Python API.

from entsoe import EntsoePandasClient
from powerplantmatching.cleaning import gather_fueltype_info
import pandas as pd

client = EntsoePandasClient(api_key=ENTSOE_TOKEN)

start = pd.Timestamp("20220101", tz="Europe/Berlin")
end = pd.Timestamp("20230101", tz="Europe/Berlin")

kwargs = dict(start=start, end=end, psr_type=None)

def parse(c):
    rename = {"GB": "UK"}
    for n in range(2):
        try:
            print(c, n)
            return client.query_installed_generation_capacity(rename.get(c, c), **kwargs).iloc[0]
        except Exception as e:
            print(f"Country {c} failed with {e}")
            time.sleep(3)
    return np.nan

stats = pd.DataFrame({c: parse(c) for c in powerplants.Country.unique()})
fueltypes = gather_fueltype_info(pd.DataFrame({"Fueltype": stats.index}), ["Fueltype"])
stats = stats.groupby(fueltypes.Fueltype.values).sum().unstack()

# https://de.wikipedia.org/wiki/Liste_von_Wasserkraftwerken_in_der_Schweiz?oldformat=true
stats.loc["CH", "Hydro"] = 17038

PyPSA / powerplantmatching

Updating the default dataset may have double counting in Hard Coal #186