EdLoach / CheckPublicTransportRelations

Tool to help me compare OSM PTv2 bus route data to TNDS opendata to see what needs updating
GNU General Public License v3.0
1 stars 0 forks source link

CheckPublicTransportRelations

Tool to help me compare OSM PTv2 bus route data to TNDS opendata to see what needs updating. Written in C# using VS2017 and .Net Framework 4.6.1

When first run you will need to populate Options, Settings with appropriate values. The Overpass ones shouldn't need changing, but you will need to set the bounding box (which defaults to one that contains the Tendring district of Essex) the local folder you want to use as the main working folder, and your credentials for downloading the Traveline National Dataset (TNDS) - if you've not got credentials you can register at https://www.travelinedata.org.uk/traveline-open-data/traveline-national-dataset/ (data is OGLv3 licenced).

You can also maintain the name substitutions used when comparing abbreviated naptan names to the non-abbreviated OSM names. The defaults work for Essex apart from the need to add " Grn" to " Green" and " Shoreditch High St" to " Shoreditch High Street" - " St" to " Street hasn't been added because of the Saint related issue with doing do.

Once populated, use File, Refresh Bus Stops to download all the nodes within the defined bounding box which have the naptan:AtcoCode tag - this was chosen as it is needed for the matching. The file is stored as a .json file in the AppData folder and read at startup if present. If bus routes are up-to-date then new stops are rare so this won't need refreshing very often. This file is used for determining which routes stop at at least one stop within the defined area.

The next step is to download the latest TNDS files - use File, Download TNDS to download all files from their ftp site (using the credentials you entered in the Settings table) to the working folder you specified. Originally it just downloaded all files, but it now syncs folders so only downloads any changes. Data is refreshed weekly according to https://data.gov.uk/dataset/traveline-national-dataset/resource/d33aac24-e7bb-4401-997d-1b494f53ebd9 though since the start of the covid-19 restrictions updates seem to be more sporadic. Even before it seemed to be National Express on a Saturday and everything else on Tuesday-ish. For the file transfer I switched early in development from using .NET ftp stuff to WinSCP to support resuming downloads much easier (I was doing initial tests on a faulty ASDL connection only getting about 1Mbps), and the newer .NET library added sync support which I've since used.

You can also download the naptan data which helps locate missing stops and show the current data to compare to anything originally imported to OSM. This data is updated daily apparently and being a web endpoint which doesn't seem to support the relevant headers, there doesn't seem to be a way to only download it if it changed, so just download it when you want to update routes to ensure you are up to date.

Once you have all the files File, Extract Local Routes will extract all .xml files for routes which contain any ATCO Codes matching those from the OSM data downloaded previously to a subfolder of the working folder based on the extract date (this may change to download to extract to a temporary folder and replace the last set if successful - currently the date related subfolders need cleaning up manually). This is summarised on the TNDS tab of the application.

Once we know which services intersect our area we can then download the OpenStreetMap data which is another Overpass query (note: this means there may be a lag between you uploading changes and them being available in a refresh of the download data to see if your fixes have worked - I usually wait 3 full minutes which under normal operation should be more than enough). This downloads to a .json file which is again loaded at startup if present. It checks for relations intersecting the bbox tagged with route=bus, and does some recursion to get up to the route master relation and down to the member ways and nodes. The route masters are summarised on the OSM tab of the application.

After extracting local routes or downloading OpenStreetMap data, the app then compares the data, which tries to match first on Operator, Service number and the number of Route Variants, then on operator and service, then just on service. These results are on the Services tab. For those where Operator and Service number (at least) match it then tries further comparison of the route variants and puts the results on the Routes tab.

For correcting the data you'll first want to make sure that operator and service number on the route_master relation in OSM match what is in the TNDS data, so you can then go on to checking the route variants. On the routes tab you can see lists of stops for the first row selected with an OSM list of stops and the first row selected with a TNDS list of stops. This allows you to highlight an existing openstreetmap route and find the changed TNDS route that corresponds (initial use found for example extra stops added near new housing estate, or stopping at a different stop where there was a choice at railway stations or on a high street). A red font was added to make it clearer where the first non-match is after I failed to spot a one character difference when comparing two lists of 44 stops without the visual help. Click on an AtcoCode in either datagrid and click Ctrl-C to copy to clipboard (to allow search by AtcoCode in JOSM for example), or doubleclick to overpass query it into JOSM (doing this too frequently will lead to rate limiting causing you to need to take an editing break). Use the '+' button to add the highlighted TNDS route stop from naptan (if sparse editing always download the surrounding area in case the stop already exists without the atcocode, and add the details to the existing stop if there is one). When comparing an OSM route to a TNDS one you can also click the * button to select all the TNDS stops in order (if they are already in the downloaded data) in JOSM to compare the stop sequence more easily.

If the executable is run with a parameter of unattended it downloads the bus stops and OSM for the last active area, checks for TNDS updates, downloads naptan, and if there were any TNDS changes then extracts local routes, and if any routes no longer match will send an email. I have this now as a scheduled task covering the Ceremonial Essex area, although at the time of editing this page haven't had an email as the TNDS data hasn't changed since I added the option (though it worked on the first test run after adding it).

OSM orphans is supposed to show bus routes in the area that aren't in a route master relation. It isn't 100% reliable as it queries route relations for the area and removes any that are in a downloaded route master area, leaving the "orphans". Because one is based purely on relations and another recursing up from bus stop nodes you sometimes get routes listed that are part of a route master relation which I am guessing is determined to not intersect the area you are working on. An example is London Bus 379 and Essex. However usually you find bus routes that were added before there was a PTv2 schema and can either upgrade them to v2, re-use the relation if the service no longer exists (I found some in Essex for routes that ran for a less than a year almost 10 years previously), or delete the relation. I tend to keep deletions of a single route to a single changeset in case they need reverting.

Unfortunately this doesn't make the task of adding services or route variants for the first time any easier. In my initial tests I found a number of services I've added that have since been cancelled (I live near the coast - some are summer only), and a number of new services that intersect the bbox that I've probably missed previously (or I filtered OSM stops based on a boundary rather than a bbox). These are obvious as they appear towards the bottom on the OSM tab.

When using the sparse editing menu option to just load the relevant ways, nodes and relations into OSM remember to always download an area before splitting a way or deleting anything. If you want to delete a London Bus route that is no longer running, download parent objects for the route master first - London Buses are part of a network relation.

Mentioning London Buses, they use a non-standard naming convention as documented at https://wiki.openstreetmap.org/wiki/London_public_transport_tagging_scheme#London_Buses - if the route has a network=London Buses tag then the name validation is changed to use that convention. If you amend any London Bus routes remember they are documented at: https://wiki.openstreetmap.org/wiki/Bus_routes_in_London

Once all the routes match then a WikiText button appears - note if any OSM route masters still show on the Services tab then the text will include a bus service which isn't currently in the TNDS data. This is the text I copy and paste to for example https://wiki.openstreetmap.org/wiki/Essex/Bus_Routes#Comparison_results between the section heading and the wiki category at the bottom.

WinSCP

This utility uses WinSCP for ftp transfers. WinSCP.exe is licenced GPL 3 and WinSCPnet.dll is licenced MPL 2.0 - details available at https://winscp.net/eng/docs/license