cal-itp / data-infra

Cal-ITP data infrastructure
https://docs.calitp.org/data-infra
GNU Affero General Public License v3.0
47 stars 12 forks source link

Pipeline handling of feeds that are deleted & replaced with a gap before next feed begins #1300

Closed lauriemerrell closed 2 years ago

lauriemerrell commented 2 years ago

I am creating this issue to record something that we are aware of but where I believe the jury is currently out on how to handle; we are in a "monitoring" stance at present & tracking the prevalence of this issue.

There is a case that we encounter occasionally where an agency handles a feed transition like this, where the dates are in order (A, B, C, D, E = D + 1 day, F).

flowchart LR
subgraph f2[Feed 2, uploaded date C]
cal2[calendar.txt covers date E = D+1 to date F]
fi2[feed_info.txt has feed_start_date E = D+1 and feed_end_date F]
end

subgraph f1[Feed 1, uploaded date A]
cal1[calendar.txt covers date B to date D]
fi1[feed_info.txt has feed_start_date B and feed_end_date D]
end

So, feed 1 is deleted on the date that feed 2 is uploaded, even though feed 1 is not supposed to expire yet, from the agency's perspective.

GTFS Best Practices say:

At any time, the published GTFS dataset should be valid for at least the next 7 days, and ideally for as long as the operator is confident that the schedule will continue to be operated.

@e-lo has submitted https://github.com/MobilityData/GTFS_Schedule_Best-Practices/issues/48 to clarify the best practices and expectations around this case in general.

However, we are still left with a question of how to handle this scenario in our pipeline. At present, we mark feed 1 as deleted on date C (as soon as feed 2 is uploaded), and the agency will show as having no service between dates C and D, until feed 2 takes effect on date E.

I believe that this handling is defensible, but it can lead to our reports and tables displaying 0 service for an agency during a period where the agency believes that feed 1's coverage should have been persisted (based, perhaps, on feed_end_date in feed_info). We have been told that app consumers keep using the old feed until the new one takes effect.

There is currently no validation being produced when these situations occur, at least in the case of 273.0 (SacRT) for the month of March (feed uploaded 3/3/22 didn't take effect until 4/3/22). There is a validation for cases where it is less than 7 or 30 days before the current feed expires, but there is no validation when the feed has not yet taken effect.

A few considerations:

cc @edasmalchi @o-ram @Nkdiaz for awareness

lauriemerrell commented 2 years ago

I'm going to close this ticket. Per conversation just now with @e-lo and @o-ram, this situation is explicitly covered in the Cal ITP FAQ. Our recommendation is to publish "future" service in parallel (at a separate link) to the current active feed.

Conversation about the best practice is occurring at https://github.com/MobilityData/GTFS_Schedule_Best-Practices/issues/48