a web-based software tool developed for the visualization, analysis, and reporting of regional and statewide transit networks in the state of Oregon
Ensure that there are at least 7 contiguous days during which all feeds are valid #4

Closed antrim closed 6 years ago

antrim commented 6 years ago

For each new GTFS dataset ensure that there are at least 7 contiguous days during which all feeds are valid (are between feed start and end dates) (feed can both be valid and have no service for period in question) [Trillium is the logical lead on this activity] Possible tactics include:

This is particularly useful for GTFS-ride. i.e. It is particularly useful to know if past service dates were valid.

antrim commented 6 years ago

@pouyalireza Can you close or delete this?

BenFields22 commented 6 years ago

This task is addressed in the newly added feed duration visualizer.

ODOT-RPTD-mb commented 6 years ago

This is still a Trillium process issue. How does Trillium trigger a build so that looking back seven days all feeds are valid. We should consider this task complete, when we have a new db in place with 7+ days of contiguous validity, and Trillium understands the issues associated with making this happen. Speed is an issue for ODOT on this issue; we can use a "solid" snapshot of the network to build some report documents. This db build should include the P&R update.

PPaulsonOregonDOT commented 6 years ago

This may be obvious, but I think that because it needs another db build, this needs to be preceeded by an answer to #16 related to evaluating where the tool will be hosted moving forward. @BenFields724, @pouyalireza, do we have an answer to that question? I think that this may also require, us to not include services that don't run year round, because those significantly shrink our window. @ed-g, in looking at the overlap of service, it looks like feed_end_date could be adjusted to extend this window, while keeping end_date in calendar.txt the same, which I would hope would allow those short term services to continue showing up correctly in google transit, etc.

antrim commented 6 years ago

Trillium should proceed "with an eye to an ongoing, perhaps quarterly build process by Trillium." (via M Barnes, @ODOT-RPTD-mb).

cc: @ed-g

Can someone please assign this to @antrim or @ed-g ? I do not have access to make assignments on this repo.

antrim commented 6 years ago

cc: @thomastrillium

Here is what the Spec says about feed_info.feed_start_date and feed_info.feed_end_date:

Field Name Required Details
feed_start_date Optional The feed provides complete and reliable schedule information for service in the period from the beginning of the feed_start_date day to the end of the feed_end_date day. Both days are given as dates in YYYYMMDD format as for calendar.txt, or left empty if unavailable. The feed_end_date date must not precede the feed_start_date date if both are given. Feed providers are encouraged to give schedule data outside this period to advise of likely future service, but feed consumers should treat it mindful of its non-authoritative status. If feed_start_date or feed_end_date extend beyond the active calendar dates defined in calendar.txt and calendar_dates.txt, the feed is making an explicit assertion that there is no service for dates within the feed_start_date or feed_end_date range but not included in the active calendar dates.
feed_end_date Optional (see above)


GTFS Best Practices says:

One GTFS dataset should contain current and upcoming service (sometimes called a “merged” dataset). Google transitfeed tool's merge function can be used to create a merged dataset from two different GTFS feeds.

  • At any time, the published GTFS dataset should be valid for at least the next 7 days, and ideally for as long as the operator is confident that the schedule will continue to be operated.
  • If possible, the GTFS dataset should cover at least the next 30 days of service.


Proposed next steps:

ODOT-RPTD-mb commented 6 years ago

First attempt to capture various perspectives on GTFS feed start and end date:


ed-g commented 6 years ago

Needed for reporting document. Was hoping to have it in place by 15th! But in next week can survive that.

ed-g commented 6 years ago
ed-g commented 6 years ago

@ODOT-RPTD-mb @PPaulsonOregonDOT @srinivas13794 @BenFields724 are the GTFS feeds in the Tna tool loaded from the Public+Private gtfs archive site? If not, what is the process for fetching GTFS feeds?

ODOT-RPTD-mb commented 6 years ago

Yes, feeds are captured from private+public Trillium site.

ed-g commented 6 years ago

Uploading feeds from http://archive.oregon-gtfs.com/archive-download-private/Oregon-Private-GTFS-feeds-2017-09-18Z.zip (you'll need a login to access that URL).

Notes to self:

Next steps after GTFS upload are "Run Update Queries" in the admin interface, and then "Activate Database".

It's necessary to make a database "inactive" before uploading GTFS. That part of the admin interface is currently broken due to #64 but you can update database_status directly using psql for the same effect.

ed-g commented 6 years ago

If we create a GIST index on census_blocks.shape, testing for contained points goes much faster.

Similar story for gtfs_trips.shape and census_tracts. In general it's a good idea to index any geometry columns when using PostGIS.

create index gtfs_trips_shape_idx ON gtfs_trips using gist(shape);

create index parknride_geom_idx ON parknride using gist(geom);

create index census_blocks_shape_idx on census_blocks using gist (shape) ;
create index census_tracts_shape_idx on census_tracts using gist (shape) ;
create index census_places_shape_idx on census_places using gist (shape) ;
create index census_congdists_shape_idx on census_congdists  using gist (shape) ;
create index census_states_shape_idx on census_states  using gist (shape) ;
create index census_urbans_shape_idx on census_urbans  using gist (shape) ;
ed-g commented 6 years ago

This will enable parallel queries on the database which could speed things up depending on the exact query.

alter database september17 set max_parallel_workers_per_gather= 8;
ed-g commented 6 years ago

Pretty close, there are a couple agencies which are ruining the party. Klamath seems to have expired a couple years ago, and CCCXPRESS starts in the future.

september17=# select enddate::date, feedname from gtfs_feed_info order by enddate asc limit 10;
  enddate   |          feedname           
 2016-09-05 | klamathshuttle-or-us
 2017-10-31 | washingtonparkshuttle-or-us
 2017-12-02 | trimet-portland-or-us
 2017-12-06 | cccxpress-or-us
 2017-12-25 | valleyretriever-or-us
 2017-12-30 | salem-or-us
 2018-01-01 | cascadespoint-or-us
 2018-01-01 | albanytransit-or-us
 2018-01-01 | cooscounty-or-us
 2018-01-01 | amtrakcascades-or-us
(10 rows)

september17=# select startdate::date, feedname from gtfs_feed_info order by startdate desc limit 10;
 startdate  |          feedname           
 2017-09-25 | cccxpress-or-us
 2017-09-10 | ctran-wa-us
 2017-09-03 | salem-or-us
 2017-09-03 | trimet-portland-or-us
 2017-08-01 | northwestpoint-or-us
 2017-08-01 | pacifictransit-wa-us
 2017-06-18 | lanetransitdistrict-or-us
 2017-05-11 | highdesertpoint-or-us
 2017-04-01 | washingtonparkshuttle-or-us
 2017-02-01 | hut-or-us
(10 rows)
ed-g commented 6 years ago

CCCXpress from 14 September looks like it starts on the 13th. I wonder why tna tool thinks it starts on the 25th.

oregon-gtfs.com also shows it as starting on the 13th.

http://www.trilliumtransit.com,"Trillium Solutions, Inc.",en,UTC: 14-Sep-2017 00:38,,support+cccxpress-or-us@trilliumtransit.com,http://support.trilliumtransit.com,20170913,20171206

UPDATE: it's probably because cccxpress is a college commuter bus and doesn't start until term does.

september17=# select * from gtfs_calendars where serviceid_agencyid = '256';
 serviceid_agencyid |    serviceid_id    | gid | monday | tuesday | wednesday | thursday | friday | saturday | sunday | startdate | enddate  
 256                | c_842_b_2858_d_31  |   0 |      1 |       1 |         1 |        1 |      1 |        0 |      0 | 20170925  | 20171206
 256                | c_842_b_2858_d_15  |   0 |      1 |       1 |         1 |        1 |      0 |        0 |      0 | 20170925  | 20171206
 256                | c_2279_b_2859_d_15 |   0 |      1 |       1 |         1 |        1 |      0 |        0 |      0 | 20170925  | 20171206
(3 rows)
ed-g commented 6 years ago

Klamath Shuttle (aka Crater Lake Trolley) should run this year until October 10th or 11th. but we're checking with them to make sure.

UPDATE: Crater Lake Trolley is a loop route where you can't disembark the trolley around the lake. It is not included in google transit for that reason. The Klamath Shuttle from the Amtrak station to the lake is active only during summer months, next from 7/1/2018 - 8/3/2018.

UPDATE 2: The schedule information was outdated from 2016, and I've loaded the current version of the feed with updated schedule for 2018.

ed-g commented 6 years ago

Protip: use accept: application/json to fetch data from Daterange backend. Browser confuses the backend by asking for html and xml.

curl -H 'accept: application/json' http://tna.trilliumtransit.com:8080/TNAtoolAPI-Webapp/queries/transit/Daterange?dbindex=10
ed-g commented 6 years ago

@ODOT-RPTD-mb @PPaulsonOregonDOT it looks like there is a little over a month of contiguous validity (September 25th through October 31st), with the single exception of the Klamath shuttle which runs only seasonally and is done for the year.


Let me know if this gives you what you need for the report?

september17=# select enddate::date, feedname from gtfs_feed_info order by enddate asc limit 5;  select startdate::date, feedname from gtfs_feed_info order by startdate desc limit 5;
  enddate   |          feedname           
 2017-10-31 | washingtonparkshuttle-or-us
 2017-12-02 | trimet-portland-or-us
 2017-12-06 | cccxpress-or-us
 2017-12-25 | valleyretriever-or-us
 2017-12-30 | salem-or-us
(5 rows)

 startdate  |       feedname        
 2018-07-01 | klamathshuttle-or-us
 2017-09-25 | cccxpress-or-us
 2017-09-10 | ctran-wa-us
 2017-09-03 | trimet-portland-or-us
 2017-09-03 | salem-or-us
(5 rows)
ODOT-RPTD-mb commented 6 years ago

Potentially this works for a short term fix. I would expect Columbia Gorge Express to show up (or not show up) in the same way that the Klamath Shuttle does in that it is also a seasonal summer service that is not currently operating?

For the longer term we probably want to:


ed-g commented 6 years ago

Sorry just a quick response for now

PPaulsonOregonDOT commented 6 years ago

@ed-g Generally, we know which feeds are seasonal (at this point, I think it's just Klamath Shuttle and Columbia Gorge Express).

PPaulsonOregonDOT commented 6 years ago

One observation that I just made related to this is that the floating box with all of the Oregon Transportation Agencies (OTA) currently shows now agencies with expired feeds, which would suggest that at least today should be considered a valid day, and I believe this entire week should qualify. However, in looking at the data graph, there are no days when all feeds are valid. This might be related to issue #32, but it's unclear where in the process the date information is getting corrupted/broken.

ed-g commented 6 years ago

@antrim I'll let you make a time estimate here

ODOT-RPTD-mb commented 6 years ago

@PPaulsonOregonDOT Phil comment ported from closed issue: This points to a bigger conversation that we need to have about what the tool looks at in terms of "up to date" data. RVTD's feed doesn't expire until 2025, but their calendar.txt end date is 9/1 on the old feed and the same file has a start date of 9/25 in the newest feed, which is the same date as it was refreshed on oregon-gtfs.com. The feed validator didn't throw an issue with this feed, which would suggest it's not out of spec, but it causes some problematic behavior in the tool.

ODOT-RPTD-mb commented 6 years ago

@antrim @PPaulsonOregonDOT @ed-g - word file with some additional notes on GTFS feed set updates and archiving. TNAST GTFS DATABASE UPDATE NOTES.docx