CSP has capacity factors > 1 when passing shapes

irm-codebase commented 1 month ago

Version Checks (indicate both or one)

[X] I have confirmed this bug exists on the lastest release of Atlite.
[ ] I have confirmed this bug exists on the current master branch of Atlite.

Issue Description

Depending on how you call csp capacity factors might be wrong

per_unit = True works as intended

capacity_factor=True has issues.

Reproducible Example

import geopandas as gpd
import atlite
import cartopy.io.shapereader as shpreader
import pandas as pd

shp = shpreader.Reader(
    shpreader.natural_earth(
        resolution="10m", category="cultural", name="admin_1_states_provinces"
    )
)
prt_records = list(
    filter(lambda r: r.attributes["iso_3166_2"].startswith("PT"), shp.records())
)
portugal_1 = (
    gpd.GeoDataFrame([{**r.attributes, "geometry": r.geometry} for r in prt_records])
    .rename(columns={"iso_3166_2": "state"})
    .set_index("state")
    .set_crs(4236)
)
portugal_1 = portugal_1.cx[portugal.bounds.minx:,:]
portugal_1.plot()

cutout = atlite.Cutout(
    path="csp-cf-series/test/cutout_csp.nc",  # previous file I had, from the Portugal example
)

csp = cutout.csp(  # I work!
    installation="SAM_solar_tower",
    per_unit=True,
    shapes=portugal_1
)
mean = csp.mean("time").to_series()
portugal_1.plot(column=mean, legend=True)

csp = cutout.csp(  # I don't :(
    installation="SAM_solar_tower",
    capacity_factor=True,
    shapes=portugal_1
)
mean = csp.mean("time").to_series()
portugal_1.plot(column=mean, legend=True)

Expected Behavior

CFs should be correct regardless of how you call for them (although... why are there two calls for the same thing? That's how this type of issue happens...)

Installed Versions

atlite = 0.2.13

irm-codebase commented 1 month ago

Here is the cutout I am using for this. cutout_csp.zip

euronion commented 1 month ago

Hi there!

Thanks for reporting.

Could you share with details how you create the cutout? (e.g. the atlite.cutout(...) command)
I don't understand why you .mean("time") the results afterwards - is this step necessary? Maybe this step messes up the values?

irm-codebase commented 1 month ago

Hello @euronion ! Sure, here is the cutout command (had to reconstruct it because of snakemake, but it's roughly like this):

cutout = atlite.Cutout(
    path="output/test.nc",
    module=["era5"],
    x=slice(-9.497466600999928, -6.205947224999932),
    y=slice(36.96588776200008, 42.15362966000002),
    time="2019-05",
    **cutout_kwargs,
)
cutout.prepare(features=["influx", "temperature"])

As for .mean("time"), it's just giving me the average across the "time" dimension (so the average CF in the time series). I'm passing it to the plot to see the average (just as a test).

sultadar commented 1 month ago

I'm facing the same issue for wind, so would be glad if you could check there as well!

shp = shpreader.Reader(
    shpreader.natural_earth(
        resolution="110m", category="cultural", name="admin_0_countries"
    )
)

eu_records = list(filter(lambda c: c.attributes["ISO_A2_EH"] in countries_iso2, shp.records()))

country_shapes = (
    gpd.GeoDataFrame([{**r.attributes, "geometry": r.geometry} for r in eu_records])
    .rename(columns={"ISO_A2_EH": "country"})
    .set_index("country")
    .set_crs(4326)
)

# Determine the bounds of the shapefile
minx, miny, maxx, maxy = country_shapes.total_bounds

# Select features
features_sel = ['height','wind','influx', 'temperature']

# Create the atlite cutout
cutout = atlite.Cutout(
    path="EU_land_and_maritime_2020-01.nc",
    module='era5',
    xs=slice(minx, maxx),
    ys=slice(miny, maxy),
    time=time_range,
)

cutout.prepare(features_sel)

capFactors_windon = cutout.wind(
    turbine="Vestas_V112_3MW",
    capacity_factor=True,
    shapes=country_shapes
    )

To then get

<xarray.DataArray 'specific generation' (time: 744, country: 33)> Size: 196kB
array([[432.75802457, 357.47933491,  10.45152178, ..., 324.12242196,
         72.75254568,   3.6701183 ],
       [472.37188334, 317.84874673,   8.4504687 , ..., 336.59759239,
         70.21263398,   4.30421531],
       [517.94336594, 304.68508964,   7.16737895, ..., 348.5062182 ,
         70.8649481 ,   5.6894943 ],
       ...,
       [503.83907213, 347.11715974,  32.26562571, ..., 391.55345446,
        394.82612232,  17.45236282],
       [482.40021769, 321.25376784,  31.6153184 , ..., 363.73621784,
        361.91767644,  18.63654856],
       [483.7726228 , 289.53571345,  29.99167237, ..., 375.60636599,
        400.22536309,  18.62705798]])
Coordinates:
  * time     (time) datetime64[ns] 6kB 2020-01-01 ... 2020-01-31T23:00:00
  * country  (country) object 264B 'SE' 'PL' 'AT' 'HU' ... 'ME' 'NO' 'FR' 'GR'
Attributes:
    units:    MW

fneum commented 1 month ago

I haven't had the chance to look at this in detail yet, but I think you have to pass per_unit=True to get capacity factor time series @sultadar.

I agree with @irm-codebase here that we may need to clear up the keyword arguments of convert_and_aggregate() (https://atlite.readthedocs.io/en/latest/ref_api.html#atlite.convert.convert_and_aggregate). Currently, the capacity_factor argument is documented to give the static (average) capacity factor, but gives values in MW. It's not clear what capacity_factor_timeseries does other than per_unit (I think it's the aggregation to shapes).

Maybe:

cutout.wind(time_series=True, ...) # -> production time series in MW
cutout.wind(time_series=True, capacity_factor=True) # -> capacity factor time series per-unit
cutout.wind(capacity_factor=True, time_series=False) # -> average capacity factor per unit

And remove capacity_factor_timeseries by something like per_grid_cell.

irm-codebase commented 1 month ago

@fneum ok, so basically: only use per_unit to avoid trouble :+1: I've been following this and my workflows work well (as far as I've checked)

sultadar commented 1 month ago

Same here, seems to work. Thanks for the swift reply @fneum!

PyPSA / atlite