Open alex-mucci opened 3 years ago
How often should the transit travel times change throughout time? Monthly? Quarterly?
There are numerous transitfeeds.com files throughout time. For example, the snip below shows there is 10 files available between 3/14/19 and 5/16/19. I am not sure if each file is different, but I am pretty sure that transit schedules do not change 10 times within 2 months. Downloading and adding all of these files to the OTP "graph" will bog down my system and potentially overload it. So the question becomes, how often does the transit travel times need to change?
My initial reaction is to pull one file out per month and use it to represent that month's schedule, but that will still be labor and computing intensive. One file per month per transit agency equates to 48 GTFS files for Chicago alone. Not every city in Massachusetts will have transit, but I would guess the number of GTFS files to be around 150-200.
Is it worth the extra work and computing power/time to have monthly transit travel times?
UPDATE:
The LEHD and ACS data is processed for all months in the Chicago RH dataset
After talking with Dr. Erhardt on 12/21/20... we decided to have Alex look into how the GTFS files change seasonally. Most likely the transit travel times will need to be calculated quarterly.
I'm having an issue selecting out of the Chicago RH dataset. For some reason it will not let me select based on month out of the file. The month column looks good (and works fine) after I pull the data out based on the year.
I would pull out based on year then filter by month, but selecting out all of the trips in one TOD during 2019 is too large for my desktop on campus. I am holding off on processing the Chicago RH data until I have a virtual machine with more memory (needs about 50 gb to read in the entire h5 file, so the 80gb you mentioned will be plenty).
I have decided on which GTFS files to use. They are the following:
CTA: 20181106 - 20190131 = https://transitfeeds.com/p/chicago-transit-authority/165/20181107/file/calendar.txt 20190130 - 20190331 = https://transitfeeds.com/p/chicago-transit-authority/165/20190131/file/calendar.txt 20190513 - 20190731 = https://transitfeeds.com/p/chicago-transit-authority/165/20190516-3/file/calendar.txt 20190801 - 20191031 = https://transitfeeds.com/p/chicago-transit-authority/165/20190805/file/calendar.txt 20191004 - 20191231 = https://transitfeeds.com/p/chicago-transit-authority/165/20191004/file/calendar.txt 20191219 - 20200229 = https://transitfeeds.com/p/chicago-transit-authority/165/20191221/file/calendar.txt
There is still some overlap between files and there is a gap between March 31st and May 13th. I don't see CTA changing their service much during the month of April and the first half of May, so I decided to go with these files. My thought process is that there is one file for the winter/holiday months (Nov-Jan), one file for spring (Feb-April), one file for summer (May-Jul), and one file for fall (Aug-Oct).
I will update this comment with similar files for METRA and PACE later.
Metra: Looks like it changes when the fiscal year changes, so there is one file for FY 18-19 and one for FY 19-20.
20180101 - 20190526 = https://transitfeeds.com/p/metra/169/20190530/file/calendar.txt 20190527 - 20200223 = https://transitfeeds.com/p/metra/169/20200228/file/calendar.txt
Pace: Looks like it changes once in the summer and once in the winter. I pulled a new file whenever a change is made for a total of 3 files.
20181119 - 20190809 = https://transitfeeds.com/p/pace/171/20181114/file/calendar.txt 20190812 - 20191115 = https://transitfeeds.com/p/pace/171/20190815/file/calendar.txt 20191209 - 20200228 = https://transitfeeds.com/p/pace/171/20191204/calendar.txt
After our meeting 12/14/20... Decided to use a pooled cross-sectional model structure. The estimation files requires the following tasks to be completed: