afimb / gtfslib-python

An open source library in python for reading GTFS files and computing various stats and indicators about Public Transport networks
GNU General Public License v3.0
44 stars 6 forks source link

A way of writing Clustered Stops back into the GTFS database #63

Open Emilio105 opened 7 years ago

Emilio105 commented 7 years ago

Is there a way of taking the clustered stops in memory and writing them back to the database removing duplicate stops. Have tried using the SQL write functions and exacting them using Sqlite3 to interface with the database but no matter what I do I always get a reference error when trying to find the stops associated with a cluster - not sure if the stopIDs are changing when I reference them cause my SQL code works when I find and replace the stopIDs manually? Any Idea whats going on?

laurentg commented 7 years ago

Stop "clusters" does not correspond to anything in the database, they are just transient data (a collection of stops). But nothing prevent you from merging stops using the generated clusters. You will have to take care of merging all the fields though (name, description etc...) and eventually create a new ID (if you do not re-use a preexisting stop as a base for the merge).

On top of that, if you merge stops, you have to replace the reference to old objects in stop_times and transfers. Normally, within a transaction, modifying an non detached object which was coming from the DAO will transparently and automatically issue the corresponding SQL update/delete/insert statements (SQLAlchemy). In this particular case, it's probably way faster to issue a direct update statement to the database to modify the corresponding ID's (UPDATE stop_times SET stop_id=X WHERE stop_id=Y). AFAIK SQLAlchemy provides helper methods to issue such updates, see here.

laurentg commented 7 years ago

Re-reading this issue, stop cluster can take the place of stations, so you could also create in the database one station per cluster, and attach all stops of the cluster to it. This if probably the best solution if you do not have pre-existing stations in your feed.