Add options for automated sub-national aggregation

FLomb commented 3 years ago

It would be cool to have an automated option for aggregating power plants into sub-national clusters within each country, based on standard sub-national units.

For instance, it would be really nice if the user could choose, as an option, the level of spatial aggregation, e.g.:

NUTS2
NUTS3
GADM

So, instead of having as an output just the aggregate capacity of each European country, one could have the aggregate capacity of each sub-national region of interest. This would facilitate a lot the functional coupling of the project to any power system model.

kais-siala commented 3 years ago

Hi @FLomb Not sure whether this feature is in the pipeline of powerplantmatching, but you could check this tool: https://github.com/tum-ens/pyPRIMA/ which I created specifically to be able to aggregate data (in particular power plant capacities from here) as I wish.

fneum commented 3 years ago

Good idea, shouldn't be too difficult to implement:

Just a sketch out of the head: assuming gdf is powerplantmatching database as geopandas DataFrame and regions the geopandas Series with the NUTS2/3, GADM shapefiles it should just be:

import  geopandas as gpd
merged = gpd.sjoin(gdf, regions, how="inner", op='within')
merged.groupby('index_right').sum() # or more specific aggregation strategies

One could have this as a frontend function where the user passes the regions shapes. I wouldn't necessarily have the shapefiles itself built into powerplantmatching.

fneum commented 3 years ago

Ok, here's how it could look in more detail.

@FLomb could you check whether this would fit your use case or what would be missing? It should allow any GeoDataFrame.

import numpy as np
import geopandas as gpd
import powerplantmatching as pm

def assign_to_shape(df, shapes, index_col):
    """
    Group powerplants by shapes of 
    E.g. NUTS2, NUTS1, GADM.

    Parameters
    ----------
    df : pd.DataFrame
        power plant list with coordinates 'lat', 'lon'
    shapes : gpd.GeoDataFrame
        GeoDataFrame with polygons as geometry,
        e.g. NUTS2, NUTS1, GADM
    index_col : str
        column of shapes to group by
    """

    CRS = 'EPSG:4326'

    gdf = gpd.GeoDataFrame(df,
        geometry=gpd.points_from_xy(df.lon, df.lat),
        crs=CRS
    )

    merged = gpd.sjoin(gdf, shapes, how="inner", op='within').to_crs(CRS)

    strategies = {
        'Capacity': np.sum,
        'Efficiency': np.mean,
        'Duration': np.mean,
        'Volume_Mm3': np.sum,
        'DamHeight_m': np.mean,
        'lat': np.mean,
        'lon': np.mean,
        'DateIn': np.mean,
        'DateRetrofit': np.mean,
    }
    groupers = [index_col, "Fueltype", "Technology", "Set", 'Country']
    return merged.groupby(groupers, as_index=False).agg(strategies)

This can be run as:

df = pm.powerplants(from_url=True)

nuts0 = gpd.read_file("nuts/NUTS_RG_01M_2016_4326_LEVL_0.geojson")

df = assign_to_shape(df, nuts0, 'NUTS_ID')

which would output:

	NUTS_ID	Fueltype	Technology	Set	Country	Capacity	Efficiency	Duration	Volume_Mm3	DamHeight_m	lat	lon	DateIn	DateRetrofit
0	AT	Hard Coal	CCGT	PP	Austria	704	nan	0	0	0	48.3269	15.9198	1987	1987
1	AT	Hard Coal	Steam Turbine	CHP	Austria	246	nan	0	0	0	46.9082	15.4922	1986	1986
2	AT	Hard Coal	Steam Turbine	PP	Austria	287.539	nan	0	0	0	48.0034	13.2309	1970	1987
3	AT	Hydro	Pumped Storage	PP	Austria	389	nan	0	0	0	46.9684	10.0599	1943	2018
4	AT	Hydro	Pumped Storage	Store	Austria	3852.3	nan	78.6592	361.35	55.3	47.0987	11.8791	1984.17	1997.5

If merged, it should probably go into https://github.com/FRESNA/powerplantmatching/blob/master/powerplantmatching/export.py

FLomb commented 3 years ago

Hi @fneum, thanks for the quick reply!

At first sight, yes, this seems pretty much what I was looking for! In your example you are still outputting at NUTS0, so I should possibly test it myself and see what happens when applied to some NUTS2/GADM shapefile. I'll try to do so asap

fneum commented 3 years ago

Yes, that would be good if you test it. Just used NUTS0 to compare with the Country column. You could use NUTS_RG_01M_2016_4326_LEVL_2.geojson from https://gisco-services.ec.europa.eu/distribution/v2/nuts/download/#nuts21

FLomb commented 3 years ago

Ok, I tested it but it looks like I'm having some issues. I've tried both using NUTS0 data for all EU and GADM data for a single country. In both cases, it kinda works but it skips several plants in the aggregation, e.g. all renewables and bioenergy. Not sure if it could be due to the fact that these techs thend to have a "nan" technology type, unlike others like Hydro.

FLomb commented 3 years ago

Ok, quick follow-up: I had some time to debug the problem and I managed to make it work by indeed avoiding any nan values for techs whose Technology type was 'nan'.

This said, there are still some issues when adopting different shapefiles. For instance, there is a Hydro-Reservoir power plant in Portugal (id: 5468, name: "Foz tua") whose coordinates (erroneously) lie outside the inland area of Portugal. Now, when using the EU-NUTS0 shapefile, the merging makes it still fall into the "PT" total capacity computation; instead, when using a GADM shapefile of Portugal, the latter is killed and skipped from the computation of total capacity of the given GADM region and hence of Portugal itself.

While this is one example, there could be tons of similar ones elsewhere; any ideas for fixing this kind of issues?

Thanks

kais-siala commented 3 years ago

You could create a small buffer around each polygon - but then you will have to deal with the issue of power plants lying in more than one region.

fneum commented 3 years ago

Hmm, yes the buffer is an option but could be very fiddly.

Was it a coarse shapefile (e.g. 60M)? Does it occur with a highly-resolved shapefile? 10m or 1m? I would see the coordinates as the ground truth rather than the country label.

Could you share how you addressed the Technology nans?

FLomb commented 3 years ago

Yeah, I had thought myself of creating a buffer around polygons, which would work in this case because the plant in question is erroneously placed in the sea by its coordinates, while it should be instead a hydro plant inland. Yet, for a broader application, this trick might easily lead to problems with neighbouring regions/countries. As far as the shapefile, I have tried a couple of different ones, the latest being the official file from the GADM website (not sure what's the resolution, they don't really seem to say it explicity as NUTS does; do they?). None changes the outcome.

As far as Technology nans, what I did, as a quick workaround, was to just fill all Technology nans with Fueltype values. This is because most of the Technology nans (at least in the subset of countries I was considering) were related to Wind/Solar, with a few Hydro plants. In this way, you get Wind / Solar as a Technology for Fueltype Wind / Solar rather than nan. And, for Hydro, you get a generic "Hydro" as a placeholder for you to eventually figure out which type of plants those are or how you want to allocate them to the rest of Hydro plant types.

PyPSA / powerplantmatching

Add options for automated sub-national aggregation #46