Open dancesWithCycles opened 2 years ago
Another possible metric would be the ratio of trips running vs those "cancelled" by calendar_dates.txt
.
Another possible metric would be the ratio of trips running vs those "cancelled" by calendar_dates.txt.
👍 If calculated per service day, this would also help catch the common mistake where an agency cancels service for a holiday but forgets or incorrectly configures the replacement service, resulting in no scheduled service (as far as the GTFS dataset shows) on the holiday.
Another possible metric would be the ratio of trips running vs those "cancelled" by
calendar_dates.txt
.
Personally, I have in mind a calculation per service days according to the trip_id -> service_id -> service interval start_date
to end_date
of calendar.txt
. Somehow, I like to know not a single day but a number of days where the ratio of trips with normal service vs. removed service indicates a mistake or a low quality in GTFS data.
Any thoughts if this validator is suited to be extended with such a rule?
Cheers!
Personally, I have in mind a calculation per service days according to the trip_id -> service_id -> service interval
start_date
toend_date
ofcalendar.txt
. Somehow, I like to know not a single day but a number of days where the ratio of trips with normal service vs. removed service indicates a mistake or a low quality in GTFS data.
I don't quite understand what exactly you're describing here. The amount/ratio of running (as in non-"cancelled") trips over the whole (start_date
, end_date
) period of the server, for each service/trip combination?
Personally, I have in mind a calculation per service days according to the trip_id -> service_id -> service interval
start_date
toend_date
ofcalendar.txt
. Somehow, I like to know not a single day but a number of days where the ratio of trips with normal service vs. removed service indicates a mistake or a low quality in GTFS data.I don't quite understand what exactly you're describing here. The amount/ratio of running (as in non-"cancelled") trips over the whole (
start_date
,end_date
) period of the server, for each service/trip combination?
@derhuerst You are right, the more time I spend on this matter, the more use cases are coming up.
cancelled
days on which trips are not offered during the service period. This might be an interesting investigation.cancelled
days that make it happen that trips are effectively running out of service before the overall defined end of service period stated in calendar.txt
.The transit authority I have in mind might be more interested in the later case. When you know the service period of the current GTFS data ends at the end of next month, you might think you have plenty of time to get hold of a new GTFS data feed.
However, when you learn that the current GTFS data does not provide schedule information for certain trips starting tomorrow (due to exceptions in calendar_dates
), you might panic on how to get the missing data already today as it does not correspond with the official and public schedule. This situation arises from a poorly created GTFS data feed.
Does this example makes the use case more clear?
Labeling as a "community rule" because we don't have an explicit mention in the spec or best practices. This validator contains a few rules that aren't clearly mentioned in the spec or best practice, because the community sees them as highly valuable (fast travel, for example). We are in favor of modifying the specification first before adding this type of check in the validator, in order to keep both aligned.
Hi folks, Thank you so much for providing and maintaining this repository. Chapeau!
A transport authority is looking for a validator, or even better the extension of an existing validator, that is best suited to add one of their use cases. What validator is funded best to suit their purpose? This use case is explained here now but is also related to issue 1117.
What problem in GTFS datasets does this new rule address? Please describe.
A passenger information system (PIS) is based on GTFS for static transit data. On a regular basis, the PIS is not providing most of the trips closer to
end_date
ofcalendar.txt
. The reason is a great number of exceptions fromcalendar_dates.txt
close toend_date
ofcalendar.txt
. As the consequence, the authority is asked to create another GTFS file not based onend_date
ofcalendar.txt
but when the number of trips per day is falling under a certain threshold.For the mentioned transport authority, a GTFS file with a certain number of days with an amount of trips lower than the threshold indicates low data quality. That observation would trigger the creation of another GTFS file.
Describe the new validation rule A GTFS file is invalid when one day or a configurable number of days between
start_date
andend_date
ofcalendar.txt
has a number of trips oftrips.txt
smaller than a configurable threshold.Error vs warning I am neither an expert of GTFS spec nor of the best practices. If the result of this rule is an error, info or warning might depend on perspective and a topic of discussion. It would probably be an error from the perspective of the mentioned transit authority.
I appreciate any hint in the right direction!
Cheers!