Esri / large-network-analysis-tools

Tools and code samples for solving large network analysis problems in ArcGIS Pro
Apache License 2.0
62 stars 7 forks source link
arcpy large-dataset nax network-analysis origin-destination python

large-network-analysis-tools

The tools and code samples here help you solve large network analysis problems in ArcGIS Pro.

We have provided some python script tools designed to solve large network analysis problems by splitting the input data into chunks and solving the chunks in parallel. You can use these tools as is, you can modify the provided scripts to suit your needs, or you can use them as an example when writing your own code.

Features

The LargeNetworkAnalysisTools.pyt toolbox has three geoprocessing tools:

Requirements

Instructions

  1. Download the latest release
  2. Modify the code to suit your needs if desired
  3. Run the code in standalone python, or run the provided geoprocessing tool from within ArcGIS Pro.

Solve Large OD Cost Matrix tool

The Solve Large OD Cost Matrix tool can be used to solve a large origin-destination cost matrix, calculating the travel time and distance from a set of origins to a set of destinations. You can use a time or distance cutoff and a number of destinations to find for each origin to reduce the problem size, and the calculations are optimized by spatially sorting the inputs. The tool can calculate extremely large OD cost matrices by chunking up the problem and solving in parallel. You can choose to save the outputs to a feature class, set of CSV files, or set of Apache Arrow tables.

Solve Large OD Cost Matrix tool inputs

Note: This tool does not utilize the geoprocessing environments for parallel processing or processor type. The settings for parallel processing are controlled using the tool parameters.

Running the tool from ArcGIS Pro

You can run the tool in ArcGIS Pro just like any other geoprocessing tool. You just need to connect to the provided Python toolbox from the Catalog Pane either in the Toolboxes section or the Folders section.

If you plan to use ArcGIS Online or a portal as your network data source, make sure you're connected to that portal in your current Pro session.

Screenshot of tool dialog

Running the tool from standalone Python

You can call the tool from your own standalone Python script.

As with any custom script tool, you must first import the toolbox within your standalone script: arcpy.ImportToolbox(<full path to LargeNetworkAnalysisTools.pyt>)

Then, you can call the tool in your script: arcpy.LargeNetworkAnalysisTools.SolveLargeODCostMatrix(<tool parameters>)

Here is the full tool signature:

arcpy.LargeNetworkAnalysisTools.SolveLargeODCostMatrix(
    Origins, Destinations, Network_Data_Source, Travel_Mode, Time_Units, Distance_Units,
    Max_Inputs_Per_Chunk, Max_Processes, Output_Updated_Origins, Output_Updated_Destinations,
    Output_Format, Output_OD_Lines_Feature_Class, Output_Folder,
    Cutoff, Num_Destinations, Time_Of_Day, Barriers, Precalculate_Network_Locations
)

You can also run the provided scripts by directly calling solve_large_odcm.py from the command line instead of using the geoprocessing tool as the code's gateway. Call python solve_large_odcm.py -h to print the command line help to show you how to do this.

Recommended settings for best performance

The tool includes several settings that can impact the overall run time.

For best performance, use the "CSV files" or "Apache Arrow files" option for the Output OD Cost Matrix Format instead of the "Feature class" option, which is much slower to process.

The other main consideration is what type of network data source is being used for the analysis, and the optimal chunk size and number of parallel processes depend on this choice.

If the network data source is ArcGIS Online, the Maximum Number of Parallel Processes parameter is capped at 4 concurrent processes so as not to overload the service for other users. The ArcGIS Online OD Cost Matrix service also limits the number of origins and destinations allowed in a single problem. As of this writing, that number is 1000, so the Maximum Origins and Destinations per Chunk parameter value cannot be greater than 1000. If you enter a larger number, the tool will automatically reduce the chunk size to the maximum allowed.

If the network data source is an ArcGIS Enterprise service, the service configuration may limit the number of allowed concurrent processes, and this number should not be exceeded for the Maximum Number of Parallel Processes parameter. (If you are the service administrator, you can update the service configuration to increase this number.) However, you also shouldn't exceed the number of logical processors of your machine (the client) because the client manages the jobs sent to the server and cannot manage more concurrent processes than it has logical cores available. Uncommonly, ArcGIS Enterprise services also limit the number of allowed inputs, and in this case the tool will automatically adjust the Maximum Origins and Destinations per Chunk to that limit if the input value is too large. However, usually ArcGIS Enterprise services do not include such limits, and the recommended chunk size depends on whether the service's network dataset is in a file geodatabase or a mobile geodatabase as discussed below.

If the network data source is a network dataset in a file geodatabase, set the Maximum Number of Parallel Processes to the number of logical processors of your machine. A Maximum Origins and Destinations per Chunk value of around 1000 or 2000 typically works best, even for very large input datasets, because these small chunks solve very quickly.

If the network data source is a network dataset in a mobile geodatabase, the internal OD Cost Matrix solver functions a little differently than it does for file geodatabase network datasets. The internal solver does its own multithreaded, parallelized operations spread across your machine's resources, so additional parallelization on the client side will not improve performance. A Maximum Number of Parallel Processes value of 2 to 4 is recommended. Additionally, because of this internal parallelization, larger OD Cost Matrix problems solve more quickly than with file geodatabase data, so you may have better overall tool run times using a Maximum Origins and Destinations per Chunk value around 10,000.

To some extent, the best chunk size depends on the configuration of your input data. The tool will spatially sort the input data if you have the Advanced license, and sorted data allows for smarter chunking. Before solving the OD Cost Matrix for each chunk of origins and destinations, it first does a simple and quick straight-line filter to remove any destinations that are very far away, and if all destinations are filtered out, the chunk will be skipped. Smaller chunks are more likely to be skipped than larger chunks, particularly if your data is highly clustered.

Technical explanation of how this tool works

The tool consists of several scripts:

Why do we have both solve_large_odcm.py and parallel_odcm.py? Why do we call parallel_odcm.py as a subprocess? This is necessary to accommodate running this tool from the ArcGIS Pro UI. A script tool running in the ArcGIS Pro UI cannot directly call multiprocessing using concurrent.futures. We must instead spin up a subprocess, and the subprocess must spawn parallel processes for the calculations. Thus, solve_large_odcm.py does all the pre-processing in the main python process, but it passes the inputs to parallel_odcm.py as a separate subprocess, and that subprocess can, in turn, spin up parallel processes for the OD Cost Matrix calculations.

Unit tests are available in the unittests folder and can help identify problems if you're editing the code.

Solve Large Analysis With Known OD Pairs tool

The Solve Large Analysis With Known OD Pairs tool can be used to calculate the travel time and distance and generate routes between preassigned origin-destination pairs. It can calculate many routes simultaneously by chunking up the problem and solving in parallel.

Multiple types of origin-destination pairs are supported:

Solve Large Analysis With Known OD Pairs tool inputs

Note: This tool does not utilize the geoprocessing environments for parallel processing or processor type. The settings for parallel processing are controlled using the tool parameters.

Running the tool from ArcGIS Pro

You can run the tool in ArcGIS Pro just like any other geoprocessing tool. You just need to connect to the provided Python toolbox from the Catalog Pane either in the Toolboxes section or the Folders section.

If you plan to use ArcGIS Online or a portal as your network data source, make sure you're connected to that portal in your current Pro session.

Screenshot of tool dialog Screenshot of tool dialog

Running the tool from standalone Python

You can call the tool from your own standalone Python script.

As with any custom script tool, you must first import the toolbox within your standalone script: arcpy.ImportToolbox(<full path to LargeNetworkAnalysisTools.pyt>)

Then, you can call the tool in your script: arcpy.LargeNetworkAnalysisTools.SolveLargeAnalysisWithKnownPairs(<tool parameters>)

Here is the full tool signature:

arcpy.LargeNetworkAnalysisTools.SolveLargeAnalysisWithKnownPairs(
    Origins, Origin_Unique_ID_Field, Destinations, Destination_Unique_ID_Field,
    OD_Pair_Type, Assigned_Destination_Field, OD_Pair_Table,
    Pair_Table_Origin_Unique_ID_Field, Pair_Table_Destination_Unique_ID_Field,
    Network_Data_Source, Travel_Mode, Time_Units, Distance_Units,
    Max_Pairs_Per_Chunk, Max_Processes, Output_Routes,
    Time_Of_Day, Barriers, Precalculate_Network_Locations, Sort_Origins, Reverse_Direction
)

You can also run the provided scripts by directly calling solve_large_route_pair_analysis.py from the command line instead of using the geoprocessing tool as the code's gateway. Call python solve_large_route_pair_analysis.py -h to print the command line help to show you how to do this.

Technical explanation of how this tool works

The tool consists of several scripts:

Why do we have both solve_large_route_pair_analysis.py and parallel_route_pairs.py? Why do we call parallel_route_pairs.py as a subprocess? This is necessary to accommodate running this tool from the ArcGIS Pro UI. A script tool running in the ArcGIS Pro UI cannot directly call multiprocessing using concurrent.futures. We must instead spin up a subprocess, and the subprocess must spawn parallel processes for the calculations. Thus, solve_large_route_pair_analysis.py does all the pre-processing in the main python process, but it passes the inputs to parallel_route_pairs.py as a separate subprocess, and that subprocess can, in turn, spin up parallel processes for the Route calculations.

Unit tests are available in the unittests folder and can help identify problems if you're editing the code.

Parallel Calculate Locations tool

The Parallel Calculate Locations tool can be used to efficiently precalculate network locations for a large dataset by chunking up the input feature class and calculates the network locations in parallel.

Note: This tool is provided in case the only thing you want to do is calculate network locations for a large dataset. If you're going to run the Solve Large OD Cost Matrix or Solve Large Analysis With Known OD Pairs tools, those tools can automatically precalculate network locations when you run them, and they use the same parallelized logic as the Parallel Calculate Locations tool.

Parallel Calculate Locations tool inputs

The tool inputs are similar to those in the core Calculate Locations tool. Please see that tool's official documentation for more details about some of the parameters.

Note: This tool does not utilize the geoprocessing environments for parallel processing or processor type. The settings for parallel processing are controlled using the tool parameters.

Running the tool from ArcGIS Pro

You can run the tool in ArcGIS Pro just like any other geoprocessing tool. You just need to connect to the provided Python toolbox from the Catalog Pane either in the Toolboxes section or the Folders section.

Screenshot of tool dialog

Note: Limitations of arcpy prevented me from using the standard SQL query builder control in the tool UI for the Search Query parameter, so you must specify the SQL query expression manually as a string. The tool does some validation to ensure that the strings are usable, but it doesn't provide any help in constructing them. The easiest way to get the queries right is to do as follows:

  1. Open the core Calculate Locations tool (the standard one in the Network Analyst Tools toolbox).

  2. Set the input features and the network dataset.

  3. Use the Search Query control in the Calculate Locations tool to construct the queries you want using the SQL expression builder.

    Screenshot of Calculate Locations tool query builder

  4. Click the SQL button on the query builder to see the raw SQL syntax and copy it.

    Screenshot of Calculate Locations tool query string

  5. Paste the SQL query string into the Parallel Calculate Locations tool dialog.

    Screenshot of Parallel Calculate Locations tool search query parameter

Running the tool from standalone Python

You can call the tool from your own standalone Python script.

As with any custom script tool, you must first import the toolbox within your standalone script: arcpy.ImportToolbox(<full path to LargeNetworkAnalysisTools.pyt>)

Then, you can call the tool in your script: arcpy.LargeNetworkAnalysisTools.ParallelCalculateLocations(<tool parameters>)

Here is the full tool signature:

arcpy.LargeNetworkAnalysisTools.ParallelCalculateLocations(
    Input_Features, Output_Features, Network_Dataset,
    Max_Features_Per_Chunk, Max_Processes,
    Travel_Mode, Search_Tolerance, Search_Criteria, Search_Query
)

Tool output

The output feature class will be a copy of the input feature class with the network location fields appended. Because the original ObjectIDs may have shifted, the output feature class includes an ORIG_OID field with the values of the original ObjectID. (If the feature class already had an ORIG_OID field, the new field may be called ORIG_OID1, ORIG_OID2, etc.)

Technical explanation of how this tool works

The tool consists of two main scripts:

Calling parallel_calculate_locations.py as a subprocess is necessary to accommodate running this tool from the ArcGIS Pro UI. A script tool running in the ArcGIS Pro UI cannot directly call multiprocessing using concurrent.futures. We must instead spin up a subprocess, and the subprocess must spawn parallel processes for the calculations.

Unit tests are available in the unittests folder and can help identify problems if you're editing the code.

Resources

Issues

Find a bug or want to request a new feature? Please let us know by submitting an issue.

Contributing

Esri welcomes contributions from anyone and everyone. Please see our guidelines for contributing.

Licensing

Copyright 2023 Esri

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

A copy of the license is available in the repository's license.txt file.