geopandas / geopandas

Python tools for geographic data
http://geopandas.org/
BSD 3-Clause "New" or "Revised" License
4.55k stars 935 forks source link

Plot chloropleth with consistent `legend` and bins #1019

Closed tommylees112 closed 5 years ago

tommylees112 commented 5 years ago

How do I set a consistent colorscheme for three axes in the same figure?

The following should be a wholly reproducible example to run the code and get the same figure I have posted below.

Get the shapefile data from the Office for National Statistics. Run this in a terminal as a bash file / commands.

wget --output-document 'LA_authorities_boundaries.zip' 'https://opendata.arcgis.com/datasets/8edafbe3276d4b56aec60991cbddda50_1.zip?outSR=%7B%22latestWkid%22%3A27700%2C%22wkid%22%3A27700%7D&session=850489311.1553456889'

mkdir LA_authorities_boundaries
cd LA_authorities_boundaries
unzip ../LA_authorities_boundaries.zip

The python code that reads the shapefile and creates a dummy GeoDataFrame for reproducing the behaviour.

import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt

gdf = gpd.read_file(
    'LA_authorities_boundaries/Local_Authority_Districts_December_2015_Full_Extent_Boundaries_in_Great_Britain.shp'
)

# 380 values
df = pd.DataFrame([])
df['AREA_CODE'] = gdf.lad15cd.values
df['central_pop'] = np.random.normal(30, 15, size=(len(gdf.lad15cd.values)))
df['low_pop'] = np.random.normal(10, 15, size=(len(gdf.lad15cd.values)))
df['high_pop'] = np.random.normal(50, 15, size=(len(gdf.lad15cd.values)))

Join the shapefile from ONS and create a geopandas.GeoDataFrame

def join_df_to_shp(pd_df, gpd_gdf):
    """"""
    df_ = pd.merge(pd_df, gpd_gdf[['lad15cd','geometry']], left_on='AREA_CODE', right_on='lad15cd', how='left')

    # DROP the NI counties
    df_ = df_.dropna(subset=['geometry'])

    # convert back to a geopandas object (for ease of plotting etc.)
    crs = {'init': 'epsg:4326'}
    gdf_ = gpd.GeoDataFrame(df_, crs=crs, geometry='geometry')
    # remove the extra area_code column joined from gdf
    gdf_.drop('lad15cd',axis=1, inplace=True)

    return gdf_

pop_gdf = join_df_to_shp(df, gdf)

Make the plots

fig,(ax1,ax2,ax3,) = plt.subplots(1,3,figsize=(15,6))

pop_gdf.plot(
    column='low_pop', ax=ax1, legend=True,  scheme='quantiles', cmap='OrRd',
)
pop_gdf.plot(
    column='central_pop', ax=ax2, legend=True, scheme='quantiles', cmap='OrRd',
)
pop_gdf.plot(
    column='high_pop', ax=ax3, legend=True,  scheme='quantiles', cmap='OrRd',
)
for ax in (ax1,ax2,ax3,):
    ax.axis('off')

enter image description here

I want all three ax objects to share the same bins (preferable the central_pop scenario quantiles) so that the legend is consistent for the whole figure.

This way I should see darker colors (more red) in the far right ax showing the high_pop scenario.

How can I set the colorscheme bins for the whole figure / each of the ax objects?

The simplest way I can see this working is either a) Provide a set of bins to the geopandas.plot() function b) extract the colorscheme / bins from one ax and apply it to another.

knaaptime commented 5 years ago

Under the hood, geopandas uses mapclassify, and the easiest way to achieve what you want would be to just use it directly:

import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
from mapclassify import Quantiles, User_Defined

# Note you can read directly from the URL
gdf = gpd.read_file('https://opendata.arcgis.com/datasets/8edafbe3276d4b56aec60991cbddda50_1.zip?outSR=%7B%22latestWkid%22%3A27700%2C%22wkid%22%3A27700%7D&session=850489311.1553456889'
)

# 380 values
df = pd.DataFrame([])
df['AREA_CODE'] = gdf.lad15cd.values
df['central_pop'] = np.random.normal(30, 15, size=(len(gdf.lad15cd.values)))
df['low_pop'] = np.random.normal(10, 15, size=(len(gdf.lad15cd.values)))
df['high_pop'] = np.random.normal(50, 15, size=(len(gdf.lad15cd.values)))

def join_df_to_shp(pd_df, gpd_gdf):
    """"""
    df_ = pd.merge(pd_df, gpd_gdf[['lad15cd','geometry']], left_on='AREA_CODE', right_on='lad15cd', how='left')

    # DROP the NI counties
    df_ = df_.dropna(subset=['geometry'])

    # convert back to a geopandas object (for ease of plotting etc.)
    crs = {'init': 'epsg:4326'}
    gdf_ = gpd.GeoDataFrame(df_, crs=crs, geometry='geometry')
    # remove the extra area_code column joined from gdf
    gdf_.drop('lad15cd',axis=1, inplace=True)

    return gdf_

pop_gdf = join_df_to_shp(df, gdf)

fig,(ax1,ax2,ax3,) = plt.subplots(1,3,figsize=(15,6))

# define your bins
bins = Quantiles(pop_gdf['central_pop'], 5).bins

# create a new column with the discretized values and plot that col
# repeat for each view
pop_gdf.assign(cl=User_Defined(df['low_pop'].dropna(), bins).yb).plot(
    column='cl', ax=ax1, cmap='OrRd'
)
pop_gdf.assign(cl=User_Defined(df['central_pop'].dropna(), bins).yb).plot(
    column='cl', ax=ax2, cmap='OrRd',
)
pop_gdf.assign(cl=User_Defined(df['high_pop'].dropna(), list(bins)).yb).plot(
    column='cl', ax=ax3, cmap='OrRd',
)
for ax in (ax1,ax2,ax3,):
    ax.axis('off')

image

tommylees112 commented 5 years ago

That's so great thank you. If you would forgive me - I have 2 questions about your plots.

  1. I didn't get the lovely colorbar you had as a legend
  2. I need the colorbar to be discrete with the bin labels

(btw I have just looked at your research profile. That is some amazing work that you have done!)

knaaptime commented 5 years ago

ah, sorry about that. I included legend=True in the last plot, which shows the colorbar. If you need the other style legend, I think I would just change the middle plot back to the original

i.e.

pop_gdf.plot(
    column='central_pop', ax=ax2, legend=True, scheme='quantiles', cmap='OrRd', legend=True, legend_kwds={XXX}
)

and if you play around with changing the legend location in the legend_kwds argument you can probably get it to sit on the far right side of all three plots

and thanks for the kind words about my work! :)

knaaptime commented 5 years ago
fig,(ax1,ax2,ax3,) = plt.subplots(1,3,figsize=(15,6))

bins = Quantiles(pop_gdf['central_pop'], 5).bins

pop_gdf.assign(cl=User_Defined(pop_gdf['low_pop'].dropna(), bins).yb).plot(
    column='cl', ax=ax1, cmap='OrRd'
)
pop_gdf.plot('central_pop', scheme='quantiles',  ax=ax2, cmap='OrRd', legend=True, cax=ax3,
             legend_kwds=dict(loc='upper right', bbox_to_anchor=(3.5, 0.75), title="Legend\n", frameon=False)

)
pop_gdf.assign(cl=User_Defined(pop_gdf['high_pop'].dropna(), list(bins)).yb).plot(
    column='cl', ax=ax3, cmap='OrRd', legend=False
)
for ax in (ax1,ax2,ax3,):
    ax.axis('off')

image

martinfleis commented 5 years ago

Thank you, @tommylees112 for your question and you, @knaaptime for the precise answer. Issue resolved, closing.

robroc commented 5 years ago

Is there a way to force the legend in a map with a binned scheme to be in a colorbar style instead of circles with labels? And label only the min and max ends? I'm sure there's a way with matplolib, but if you have an example handy it would be a huge help.

raphmu86 commented 4 years ago

@robroc did you found a solution to your question? would be interested to save some time here too! many thanks!

ShouravBR commented 4 years ago

@raphmu86 im not sure if this answers your question.

If the column is numerical, it will show up as a colorbar. To get discrete colors instead of a continuous gradient, pass a custom colormap to the plot function. To modify ticks, pass params to the legend kwds or use the vmin/vmax

gdf.plot(column=colname,
             cmap=plt.get_cmap('Blues',10),
             vmin=0, vmax=1,
             legend_kwds={'label': 'Coverage', 'ticks': np.arange(0,1.1, 0.2)})

image

robroc commented 4 years ago

@ShouravBR Does this work if you pass something into scheme? I know the colorbar is default without it.

ShouravBR commented 4 years ago

@robroc No, the colorbar does not show. Uneven ticks from the scheme make it hard to manually create a colorbar. image image

gdf.plot('X',scheme='quantiles', legend=True, cmap='Blues')
gdf.plot('X', legend=True, cmap=plt.get_cmap('Blues',5),
         legend_kwds={'ticks': [3,21,43.20,54,72.4,99]})