CUTR-at-USF / gtfsrdb

GTFSrDB is a tool to archive gtfs-realtime data to a database.
Other
38 stars 13 forks source link

Feeding a list of gtfs-rt into a database #3

Closed XavierPrudent closed 7 years ago

XavierPrudent commented 7 years ago

Dear authors, The gtfsdb package can be fed by either an url to the gtfs, or a list of gtfs files. Is this feature also available in gtfsrdb? If I have a directory with a 2 weeks history of gtfs-rt, can they be inserted into the db using gtfsrdb? Thanks in advance, regards, Xavier Prudent

barbeau commented 7 years ago

Whoops! Sorry, didn't mean to close this - I'm re-opening.

@XavierPrudent Right now I don't believe this is possible given the existing tool, but I think we'd certainly be interested in getting this added as a feature - pull requests welcome. Is this something your team would be interested in implementing? I'm not sure yet if this is something we'll be directly working on or not, but I'll talk with @jadorno and see. EDIT - this is possible now - see https://github.com/CUTR-at-USF/gtfsrdb/issues/3#issuecomment-289788790, it's now documented in README.

XavierPrudent commented 7 years ago

Hello Sean, That would be a good starting point for a project indeed. I will talk to my colleagues this afternoon. I guess one just needs to set up a serveur and loop.

BTW, in the SELECT statement in the github page https://github.com/CUTR-at-USF/gtfsrdb

are you sure these :: should be points?

WHERE stops.stop_id::text = stop_time_updates.stop_id::text

and looking at a sqlLite DB I created using gtfsrdb, there was no table trips. The input data for the creation of the table were a tripUdate and a vehiclePosition gtfs-rt, as produced by the HART-gtfs-rt generator code.

Regards, Xavier

2017-03-27 14:43 GMT-04:00 Sean Barbeau notifications@github.com:

Whoops! Sorry, didn't mean to close this - I'm re-opening.

@XavierPrudent https://github.com/XavierPrudent Right now I don't believe this is possible given the existing tool, but I think we'd certainly be interested in getting this added as a feature - pull requests welcome. Is this something your team would be interested in implementing? I'm not sure yet if this is something we'll be directly working on or not, but I'll talk with @jadorno https://github.com/jadorno and see.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CUTR-at-USF/gtfsrdb/issues/3#issuecomment-289546637, or mute the thread https://github.com/notifications/unsubscribe-auth/AWa_lL6J-21E577gz_o4K9xwvxvKgimoks5rqANPgaJpZM4Mqmaf .

--

Xavier Prudent

Analyste de données - Forage de données - Apprentissage statistique Data Scientist - Data Mining - Machine Learning

Web : www.xavierprudent.com http://www.xavierprudent.com Tel : 06 66 61 19 31 Skype : xavierprudent

barbeau commented 7 years ago

I will talk to my colleagues this afternoon. I guess one just needs to set up a serveur and loop.

Yes, I think we'd want to mirror the behavior of gtfsdb. IIRC the current behavior is just to loop and execute an HTTP request every X seconds. To support pulling from archived feeds, I think you'd support an option to pass a file name via the command line instead of a URL. This file would contain a list of filenames (or just a directory locally or online where the PB files are stored), and the tool would then loop through them as fast as possible and process each one and insert into DB.

are you sure these :: should be points?

Hmmm, no, I'm not. It's been a while since I worked on this, and I know the wiki went through several format conversions. I'm guessing that's an error of the format conversion, and shouldn't be there. You're suggesting that we should remove ::text from the query everywhere it appears? Can you confirm the query works if you do this?

and looking at a sqlLite DB I created using gtfsrdb, there was no table trips.

I believe this assumes that you've loaded the GTFS static zip file data into the same database as the real-time data. IIRC there is a caveat here - when you load GTFS data, it wipes the database. So, you may need to load the GTFS data first, and then archive the GTFS-rt data. It would be good to confirm if this is (or was) an issue, and if it is, fix it.

XavierPrudent commented 7 years ago

Hi Sean,

2017-03-27 15:19 GMT-04:00 Sean Barbeau notifications@github.com:

I will talk to my colleagues this afternoon. I guess one just needs to set up a serveur and loop.

Yes, I think we'd want to mirror the behavior of gtfsdb. IIRC the current behavior is just to loop and execute an HTTP request every X seconds. To support pulling from archived feeds, I think you'd support an option to pass a file name via the command line instead of a URL. This file would contain a list of filenames, and the tool would then loop through them as fast as possible and process each one and insert into DB.

I meet a professor of the Polytech Montréal next week on the opportunity to include students. Such a pimp-up of gtfsrdb would be a good starting point. http://www.polymtl.ca/recherche/rc/en/professeurs/details.php?NoProf=190

are you sure these :: should be points?

Hmmm, no, I'm not. It's been a while since I worked on this, and I know the wiki went through several format conversions. I'm guessing that's an error of the format conversion, and shouldn't be there. You're suggesting that we should remove ::text from the query everywhere it appears? Can you confirm the query works if you do this?

the query worked fine when replacing the :: by a . I thought that was an unfortunate copy&paste from some jave

and looking at a sqlLite DB I created using gtfsrdb, there was no table trips.

I believe this assumes that you've loaded the GTFS static zip file data into the same database as the real-time data. IIRC there is a caveat here - when you load GTFS data, it wipes the database. So, you may need to load the GTFS data first, and then archive the GTFS-rt data. It would be good to confirm if this is (or was) an issue, and if it is, fix it.

By GTFS data, you mean static GTFS, right? The options in gtfsrdb https://github.com/CUTR-at-USF/gtfsrdb include only the possibilities to include trip updates, vehicle positions and trip alerts.

You mean creating the DB with gtfsdb, then calling gtfsrdb without the "-c" argument, right?

BTW, the link to gtfsdb could be updated to https://github.com/OpenTransitTools/gtfsdb

regards, Xavier

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CUTR-at-USF/gtfsrdb/issues/3#issuecomment-289556991, or mute the thread https://github.com/notifications/unsubscribe-auth/AWa_lM-QNxaEBf8fuSIquhLIJZcT7c_Wks5rqAvIgaJpZM4Mqmaf .

--

Xavier Prudent

Analyste de données - Forage de données - Apprentissage statistique Data Scientist - Data Mining - Machine Learning

Web : www.xavierprudent.com http://www.xavierprudent.com Tel : 06 66 61 19 31 Skype : xavierprudent

barbeau commented 7 years ago

the query worked fine when replacing the :: by a .

Thanks, just fixed this in https://github.com/CUTR-at-USF/gtfsrdb/commit/198938bd31a21e34e7d5cc1d4b53458715304316. Let me know if this doesn't work.

By GTFS data, you mean static GTFS, right? The options in gtfsrdb https://github.com/CUTR-at-USF/gtfsrdb include only the possibilities to include trip updates, vehicle positions and trip alerts.

Yes, static GTFS data would contain the data for the trips table.

You mean creating the DB with gtfsdb, then calling gtfsrdb without the "-c" argument, right?

Yes, you'd want to exclude -c so gtfsrdb doesn't wipe the database.

IIRC, though, there was another issue that would prevent you from running gtfsrdb on an ongoing basis to continuously archive GTFS-rt data and also loading in the static GTFS data using gtfsdb as it became available multiple times (e.g., four times a year), but I don't recall exactly what the problem was.

BTW, the link to gtfsdb could be updated to https://github.com/OpenTransitTools/gtfsdb

Thanks for the heads up, just fixed that in https://github.com/CUTR-at-USF/gtfsrdb/commit/ef9326355a3a098c315914bf8e7fc6236d3f6d8b.

jadorno commented 7 years ago

Hey guys, just a heads up,

This tool can be used to read files at it's current state. Just run as follows:

python gtfsrdb.py --once -p file://<path-to-file> -d <db-url>

This can easily be wrapped on a bash script that iterates over all the files and just changes the filename. I can add something about this on the README if you'd like.

Lastly, the -c parameter will not wipe your database. It simply creates tables if they're missing. Nothing more.

barbeau commented 7 years ago

Thanks @jadorno! Yes, please go ahead and open PR with update to Readme on file usage. I'll leave this issue open until we update the Readme as a reminder.

barbeau commented 7 years ago

Alright, loading data from files using Bash and MySQL is now documented in the README under Example 3 via https://github.com/CUTR-at-USF/gtfsrdb/commit/0980c1668391efbe34b6fa76fc3b1f7e59bed395.

#!/bin/sh
for file in /path/to/files/*; 
do 
  python /path/to/gtfsrdb.py --once -p file://$file -d "mysql://<username>:<password>@<public_database_server_name>/<database_name>" -c
done

Thanks @jadorno!