MobilityData / gtfs-validator

Canonical GTFS Validator project for schedule (static) files.
https://gtfs-validator.mobilitydata.org/
Apache License 2.0
287 stars 101 forks source link

Timepoint clarification #1795

Closed mobilitydataio closed 3 weeks ago

mobilitydataio commented 3 months ago

Requirements

Datasets that have all departure/arrival_time populated and that use a mix of specified and empty timepoint values (sample 1 https://github.com/google/transit/pull/474#issue-2339968072).

empty timepoint values used to inform that times are exact (example) -> should be replaced with values 1 empty timepoint values used to inform that times are approximate (example) -> should be replaced with values 0 Datasets with omitted times and timepoint values 1 for records with times (sample 2 https://github.com/google/transit/pull/474#issue-2339968072, example): they won't trigger a WARNING anymore.

Flex datasets that don't use departure/arrival_time and don't have timepoint defined: they won't trigger a WARNING anymore.

If this PR gets merged, we will make a modification of the canonical validator, as its logic is currently to give a WARNINGS in all cases of timepoint="".

Commit Nº 1

AUTHOR × DATE

https://github.com/google/transit/commit/b2ee3c857e2ac0216e5ef4c8a6447901c4e16b71

isabelle-dr commented 2 months ago

Here is the effect of this spec change on the validator.

Current validator behavior

New validator behavior

With this new behavior, we don't need to distinguish between datasets that have the column header and those that don't. We could remove missing_recommended_column altogether because I don't believe it's used for anything else, and modify missing_timepoint_value. We could also use the more generic missing_recommended_field, if we are able to generate it with a condition.

Expected effect on production data The # datasets that trigger the new notice should be slightly smaller than the sum of the # datasets that trigger missing_recommended_column and missing_timepoint_value today (note that no dataset can trigger both, it's one or the other). This is because the datasets that have the following modeling will not trigger warnings anymore: stop_sequence arrival_time departure_time timepoint
1 8:30:00 8:30:00 1
2 8:31:01 8:31:01 1
3
4
5
6
7
8
9
10 8:45:00 8:45:00 1
11 8:55:00 8:55:00 1

Additional note Note that the logic that triggers the error stop_time_timepoint_without_times is unchanged: it's triggered for datasets that have no times defined for records with timepoint == 1

cc @tzujenchanmbd

emmambd commented 2 months ago

Tasks: