PyPSA / powerplantmatching

Set of tools to combine multiple power plant databases
https://powerplantmatching.readthedocs.io/en/latest/
GNU General Public License v3.0
152 stars 53 forks source link

Add options for automated sub-national aggregation #46

Open FLomb opened 3 years ago

FLomb commented 3 years ago

It would be cool to have an automated option for aggregating power plants into sub-national clusters within each country, based on standard sub-national units.

For instance, it would be really nice if the user could choose, as an option, the level of spatial aggregation, e.g.:

So, instead of having as an output just the aggregate capacity of each European country, one could have the aggregate capacity of each sub-national region of interest. This would facilitate a lot the functional coupling of the project to any power system model.

kais-siala commented 3 years ago

Hi @FLomb Not sure whether this feature is in the pipeline of powerplantmatching, but you could check this tool: https://github.com/tum-ens/pyPRIMA/ which I created specifically to be able to aggregate data (in particular power plant capacities from here) as I wish.

fneum commented 3 years ago

Good idea, shouldn't be too difficult to implement:

Just a sketch out of the head: assuming gdf is powerplantmatching database as geopandas DataFrame and regions the geopandas Series with the NUTS2/3, GADM shapefiles it should just be:

import  geopandas as gpd
merged = gpd.sjoin(gdf, regions, how="inner", op='within')
merged.groupby('index_right').sum() # or more specific aggregation strategies

One could have this as a frontend function where the user passes the regions shapes. I wouldn't necessarily have the shapefiles itself built into powerplantmatching.

fneum commented 3 years ago

Ok, here's how it could look in more detail.

@FLomb could you check whether this would fit your use case or what would be missing? It should allow any GeoDataFrame.

import numpy as np
import geopandas as gpd
import powerplantmatching as pm

def assign_to_shape(df, shapes, index_col):
    """
    Group powerplants by shapes of 
    E.g. NUTS2, NUTS1, GADM.

    Parameters
    ----------
    df : pd.DataFrame
        power plant list with coordinates 'lat', 'lon'
    shapes : gpd.GeoDataFrame
        GeoDataFrame with polygons as geometry,
        e.g. NUTS2, NUTS1, GADM
    index_col : str
        column of shapes to group by
    """

    CRS = 'EPSG:4326'

    gdf = gpd.GeoDataFrame(df,
        geometry=gpd.points_from_xy(df.lon, df.lat),
        crs=CRS
    )

    merged = gpd.sjoin(gdf, shapes, how="inner", op='within').to_crs(CRS)

    strategies = {
        'Capacity': np.sum,
        'Efficiency': np.mean,
        'Duration': np.mean,
        'Volume_Mm3': np.sum,
        'DamHeight_m': np.mean,
        'lat': np.mean,
        'lon': np.mean,
        'DateIn': np.mean,
        'DateRetrofit': np.mean,
    }
    groupers = [index_col, "Fueltype", "Technology", "Set", 'Country']
    return merged.groupby(groupers, as_index=False).agg(strategies)

This can be run as:

df = pm.powerplants(from_url=True)

nuts0 = gpd.read_file("nuts/NUTS_RG_01M_2016_4326_LEVL_0.geojson")

df = assign_to_shape(df, nuts0, 'NUTS_ID')

which would output:

NUTS_ID Fueltype Technology Set Country Capacity Efficiency Duration Volume_Mm3 DamHeight_m lat lon DateIn DateRetrofit
0 AT Hard Coal CCGT PP Austria 704 nan 0 0 0 48.3269 15.9198 1987 1987
1 AT Hard Coal Steam Turbine CHP Austria 246 nan 0 0 0 46.9082 15.4922 1986 1986
2 AT Hard Coal Steam Turbine PP Austria 287.539 nan 0 0 0 48.0034 13.2309 1970 1987
3 AT Hydro Pumped Storage PP Austria 389 nan 0 0 0 46.9684 10.0599 1943 2018
4 AT Hydro Pumped Storage Store Austria 3852.3 nan 78.6592 361.35 55.3 47.0987 11.8791 1984.17 1997.5

If merged, it should probably go into https://github.com/FRESNA/powerplantmatching/blob/master/powerplantmatching/export.py

FLomb commented 3 years ago

Hi @fneum, thanks for the quick reply!

At first sight, yes, this seems pretty much what I was looking for! In your example you are still outputting at NUTS0, so I should possibly test it myself and see what happens when applied to some NUTS2/GADM shapefile. I'll try to do so asap

fneum commented 3 years ago

Yes, that would be good if you test it. Just used NUTS0 to compare with the Country column. You could use NUTS_RG_01M_2016_4326_LEVL_2.geojson from https://gisco-services.ec.europa.eu/distribution/v2/nuts/download/#nuts21

FLomb commented 3 years ago

Ok, I tested it but it looks like I'm having some issues. I've tried both using NUTS0 data for all EU and GADM data for a single country. In both cases, it kinda works but it skips several plants in the aggregation, e.g. all renewables and bioenergy. Not sure if it could be due to the fact that these techs thend to have a "nan" technology type, unlike others like Hydro.

FLomb commented 3 years ago

Ok, quick follow-up: I had some time to debug the problem and I managed to make it work by indeed avoiding any nan values for techs whose Technology type was 'nan'.

This said, there are still some issues when adopting different shapefiles. For instance, there is a Hydro-Reservoir power plant in Portugal (id: 5468, name: "Foz tua") whose coordinates (erroneously) lie outside the inland area of Portugal. Now, when using the EU-NUTS0 shapefile, the merging makes it still fall into the "PT" total capacity computation; instead, when using a GADM shapefile of Portugal, the latter is killed and skipped from the computation of total capacity of the given GADM region and hence of Portugal itself.

While this is one example, there could be tons of similar ones elsewhere; any ideas for fixing this kind of issues?

Thanks

kais-siala commented 3 years ago

You could create a small buffer around each polygon - but then you will have to deal with the issue of power plants lying in more than one region.

fneum commented 3 years ago

Hmm, yes the buffer is an option but could be very fiddly.

Was it a coarse shapefile (e.g. 60M)? Does it occur with a highly-resolved shapefile? 10m or 1m? I would see the coordinates as the ground truth rather than the country label.

Could you share how you addressed the Technology nans?

FLomb commented 3 years ago

Yeah, I had thought myself of creating a buffer around polygons, which would work in this case because the plant in question is erroneously placed in the sea by its coordinates, while it should be instead a hydro plant inland. Yet, for a broader application, this trick might easily lead to problems with neighbouring regions/countries. As far as the shapefile, I have tried a couple of different ones, the latest being the official file from the GADM website (not sure what's the resolution, they don't really seem to say it explicity as NUTS does; do they?). None changes the outcome.

As far as Technology nans, what I did, as a quick workaround, was to just fill all Technology nans with Fueltype values. This is because most of the Technology nans (at least in the subset of countries I was considering) were related to Wind/Solar, with a few Hydro plants. In this way, you get Wind / Solar as a Technology for Fueltype Wind / Solar rather than nan. And, for Hydro, you get a generic "Hydro" as a placeholder for you to eventually figure out which type of plants those are or how you want to allocate them to the rest of Hydro plant types.