ONSdigital / SDG_11.2.1

Analysis for the UN Sustainable Development Goal 11.2.1
https://onsdigital.github.io/SDG_11.2.1/
Apache License 2.0
5 stars 7 forks source link

Consider: Clustering stops within 50m of each other #217

Open james-westwood opened 2 years ago

james-westwood commented 2 years ago

The EU team clustered stops that were close together to create a single point between them. This would reduce the number of stops from which the service area needs to be calculated which would be especially important when running a computationally heavy network/path computation of service areas.

james-westwood commented 1 year ago

@nkshaw23

This is the code to cluster the stops, taken from the Python scripts (revised SDG 11.2.1 methodology) zip file at on this page

This is using ArcGis but we could do this in Geopandas.

# -------------------------------------------------------------------------------------------------------------------------------
# SDG_01_Cluster_Stops.py
# Author: Olivier DRAILY, EC, DG REGIO
# Created on: 2021-10-27
#
# Description: 
# The script uses two input point datasets of stops,
# - one for the low-capacity modes (e.g. bus, tram)
# - one for the high-capacity modes (e.g. train, metro)
# from which it creates two point feature datasets of clustered stops.
#
# All stops located within 50 m from each other are considered as a single cluster of stops.
# Each cluster is represented by a single point, located at the centre of the clustered stops. 
# The output feature dataset contains all original points which do not have any neighbouring
# points within a radius of 50 meters, in addition to the new clustered points.
# -------------------------------------------------------------------------------------------------------------------------------

# Import modules
# -------------------------------------------------------------------------------------------------------------------------------
from __future__ import print_function
import arcpy, sys, os, time
from arcpy import env

# Local variables
# -------------------------------------------------------------------------------------------------------------------------------
low_capacity_stops_to_cluster = r"Sample.gdb\Low_capacity_stops_pt"  # Input low_capacity point feature dataset with stops to cluster.
high_capacity_stops_to_cluster = r"Sample.gdb\High_capacity_stops_pt"  # Input high_capacity point feature dataset with stops to cluster. Use "" if it does not exist.
out_workspace = r"Sample.gdb"   # Output workspace, a geodatabase   # r"Sample.gdb" 
cluster_distance = 50   # All stops located within this distance from another stop are considered as a single cluster of stops 

# Environment
# -------------------------------------------------------------------------------------------------------------------------------
arcpy.env.workspace = out_workspace
arcpy.env.overwriteOutput = True

# Routine printMessage
# -------------------------------------------------------------------------------------------------------------------------------
def printMessage(msg):
    print(msg)  # with Python IDLE 3.x or with from __future__ import print_function
    arcpy.AddMessage(msg) # in ArcGIS

# Main Process
# -------------------------------------------------------------------------------------------------------------------------------
for stops_to_cluster in [low_capacity_stops_to_cluster, high_capacity_stops_to_cluster]:  # List of input point feature datasets with stops to cluster
    if arcpy.Exists(stops_to_cluster):
        printMessage("Processing " + os.path.basename(stops_to_cluster))
        out_dataset_name = "Clustered_" + os.path.basename(stops_to_cluster)  

        # Delete the dataset if it already exists
        if arcpy.Exists(out_dataset_name):
            arcpy.Delete_management(out_dataset_name)

        # Make a buffer around each stop
        printMessage(" Make a buffer around each stop")
        # To identify all stops located within 50 meters distance from another stop, buffers must be the half of cluster_distance
        buffer_distance = cluster_distance/2 
        arcpy.Buffer_analysis(stops_to_cluster, "Buffers", str(buffer_distance) + " Meters")

        # Merge contiguous buffers
        printMessage(" Merge contiguous buffers")
        arcpy.Dissolve_management("Buffers", "BuffersDissolved", "", "", "SINGLE_PART")
        arcpy.Delete_management("Buffers")

        # Assign buffer ID to stops
        printMessage(" Assign buffer ID to stops")
        arcpy.Identity_analysis(stops_to_cluster, "BuffersDissolved", "IdentityFcl", "ALL", "", "NO_RELATIONSHIPS")
        arcpy.Delete_management("BuffersDissolved")

        # Stops with the same buffer ID are clustered
        printMessage(" Stops with the same buffer ID are clustered")
        arcpy.MeanCenter_stats("IdentityFcl", out_dataset_name, "", "FID_BuffersDissolved")
        arcpy.Delete_management("IdentityFcl")

        # Deleting unnecessary fields
        arcpy.DeleteField_management(out_dataset_name, ["XCoord", "YCoord", "FID_BuffersDissolved"])

printMessage("End: " + time.strftime("%H:%M:%S", time.localtime()))