Closed mcplanner-zz closed 2 years ago
Thanks @mcplanner for opening.
The StopTimeTimepointWithoutTimesNotice error is displayed when the stop_times.timepoint field is blank. This is unexpected. I would expect this to be interpreted as having no timepoints.
The GTFS specification says that if stop_times.timepoint = 1 or empty
then the times are considered exact. From my understanding a stop time with no value for timepoint
and no value for stop_times.arrival_time
or stop_times.departure_time
should issue a StopTimeTimepointWithoutTimesNotice
.
If the timepoint field isn't being used, I expect an error about it's being missing.
I am not sure if a notice should be generated here, the specification handles the case where no value is attributed to stop_times.timepoint
: default value is 1.
Please feel free to close this issue, if that answers your interrogations.
Here's example data from the above GTFS file:
trip_id | arrival_time | departure_time | stop_id | stop_sequence | stop_headsign | pickup_type | drop_off_type | shape_dist_traveled |
---|---|---|---|---|---|---|---|---|
784678 | 07:25:00 | 07:25:00 | 22256 | 1 | ||||
784678 | 22239 | 2 | ||||||
784678 | 07:28:00 | 07:28:00 | 22361 | 3 | ||||
784678 | 22225 | 4 |
tl;dr - This data is fine and the validator shouldn't be logging any errors. The reason is that this data conforms to the original GTFS spec prior to the timepoint
field existing (timepoint
is optional). The original GTFS spec said that you should only provide times for records that were timepoints. So in the above data stop_sequence 1 and 3 are timepoints, and 2 and 4 are not.
IMHO the timepoint definition really needs to be clarified in the spec as it currently doesn't communicate this concept clearly - see https://github.com/google/transit/issues/61.
@MobilityData/transit-specs
If the timepoint field isn't being used, I expect an error about it's being missing.
The GTFS Best Practices (http://gtfs.org/best-practices/#stop_timestxt) say:
The timepoint field should be provided. It specifies which stop_times the operator will attempt to strictly adhere to (timepoint=1), and that other stop times are estimates (timepoint=0).
Sticking to the canonical validator definition, timepoint field missing couldn't be an error, but could certainly be a warning.
Thanks for the clarification @barbeau! Following your suggestion, I agree that the spec is not clear on this particular point. I'll keep this issue open so that we can fix this bug.
Does the following seems reasonable?
stop_times.txt
comes from a legacy dataset: does the header contain column timepoint
or not? This means that we'll have to modify the core
module to include the boolean field hasTimepointColumn
(or similar) boolean to GtfsStopTimeTableContainer
. 1
from GtfsStopTimeSchema
for field timepoint
:
https://github.com/MobilityData/gtfs-validator/blob/464ff36316162e71d7f154e64e00d87b21f91dd4/main/src/main/java/org/mobilitydata/gtfsvalidator/table/GtfsStopTimeSchema.java#L77stop_times.txt
has value timepoint=1
and no value for stop_times.arrival_time
and/or stop_times.departure_time
.Here are the cases to cover.
For all stop_times.txt files:
If timepoint column does not exist:
If timepoint column exists:
There is also the case where a stop_times.txt file does not have the timepoint column, but all records have arrival and departure times populated. Technically this was against the original GTFS spec, but it became a common practice, and therefore we shouldn't consider it an error because it will invalidate a lot of old datasets. In this case we'll emit a warning anyway that timepoint should be added.
Does that make sense?
Here are the cases to cover.
For all stop_times.txt files:
- First and last record for each trip must have arrival and departure time, else ERROR
✅ Covered in MissingTripEdgeValidator
.
- If arrival time or departure time is populated for a record, then both arrival time AND departure time must be populated for that record, else ERROR
✅ Covered in StopTimeArrivalAndDepartureTimeValidator
.
If timepoint column does not exist:
- Emit WARNING that timepoint column should be added
If timepoint column exists:
- All records must have 0 or 1 value in timepoint column, else ERROR
- If timepoint==1, arrival and departure time must both be populated, else ERROR
There is also the case where a stop_times.txt file does not have the timepoint column, but all records have arrival and departure times populated. Technically this was against the original GTFS spec, but it became a common practice, and therefore we shouldn't consider it an error because it will invalidate a lot of old datasets. In this case we'll emit a warning anyway that timepoint should be added.
⏳ TimepointTimeValidator
should be modified to this extent.
Does that make sense?
Yes, @isabelle-dr should the specification be clarified first or could these changes be implemented before?
Hey all, thanks for flagging and looking into this.
The case of legacy data with no timepoint column (prior to the timepoint field existing) was flagged when we initially introduced that rule.
I agree that the root problem is that the specification contains two contradictory statements about the meaning of an "empty" value in stop_times.timepoint
stop_times.timepoint
:1 or empty - Times are considered exact.
stop_times.departure_time
and stop_times.arrival_time
:Conditionally Required: Required for timepoint=1. Optional otherwise.
The former could mean thatstop_times.timepoint
== 1 is equivalent than stop_times.timepoint
== "".
The latter could mean that the times are required when stop_times.timepoint
== 1 and are not required when stop_times.timepoint
== "".
If we consider that an empty timepoint column is equivalent to empty values in all stops_times.timepoint
fields, all legacy datasets become invalid, which is how it was coded in the validator and what @mcplanner-zz is flagging here. She also points out how counter-intuitive it is that having no values for timepoint is considered a timepoint.
This is extremely well described in google/transit/issues/61. The "empty" word in the first statement seems to actually refer to an empty timepoint column which maintains the backwards compatibility.
How should the validator work?
If timepoint column does not exist:
Emit WARNING that timepoint column should be added
Absolutely, and this will be a new notice based on the Best Practices, that we should add at some point.
If timepoint column exists:
All records must have 0 or 1 value in timepoint column, else ERROR If timepoint==1, arrival and departure time must both be populated, else ERROR
I would only implement the second statement proposed here (at least for now) because the first is stricter than the specification: it would make the following data invalid, and there is nothing in spec currently that would justify that. stop_sequence | arrival_time | departure_time | timepoint |
---|---|---|---|
1 | 00:00:00 | 00:00:00 | 1 |
2 | |||
3 | |||
4 | 00:10:00 | 00:10:00 | 1 |
@lionel-nj I'd say we're good to make the changes in the validator to maintain the backwards compatibility (second statement proposed by Sean, and your initial idea), but not upgrading this rule to error, this will be done in the PR we already have.
Here are some examples of what would change, @barbeau does this seem right?
Current behaviour: The notice is emitted for stops 2 & 3 Expected behaviour: No notice stop_sequence | arrival_time | departure_time |
---|---|---|
1 | 00:00:00 | 00:00:00 |
2 | ||
3 | ||
4 | 00:10:00 | 00:10:00 |
Current behaviour: No notice Expected behaviour: No notice stop_sequence | arrival_time | departure_time | timepoint |
---|---|---|---|
1 | 00:00:00 | 00:00:00 | 1 |
2 | 00:02:00 | 00:02:00 | 0 |
3 | 00:08:00 | 00:08:00 | 0 |
4 | 00:10:00 | 00:10:00 | 1 |
Current behaviour: The StopTimeTimepointWithoutTimesNotice is emitted for stop 2 Expected behaviour: The StopTimeTimepointWithoutTimesNotice is emitted for stop 2 stop_sequence | arrival_time | departure_time | timepoint |
---|---|---|---|
1 | 00:00:00 | 00:00:00 | 1 |
2 | 1 | ||
3 | 00:08:00 | 00:08:00 | 0 |
4 | 00:10:00 | 00:10:00 | 1 |
Current behaviour: No notice Expected behaviour: No notice stop_sequence | arrival_time | departure_time | timepoint |
---|---|---|---|
1 | 00:00:00 | 00:00:00 | 1 |
2 | 00:02:00 | 00:02:00 | |
3 | 00:08:00 | 00:08:00 | |
4 | 00:10:00 | 00:10:00 | 1 |
Current behaviour: The StopTimeTimepointWithoutTimesNotice is emitted for stops 2 and 3 Expected behaviour: No notice stop_sequence | arrival_time | departure_time | timepoint |
---|---|---|---|
1 | 00:00:00 | 00:00:00 | 1 |
2 | |||
3 | |||
4 | 00:10:00 | 00:10:00 | 1 |
@isabelle-dr Cases 1, 2, and 3 LGTM 👍
I would only implement the second statement proposed here (at least for now) because the first is stricter than the specification: it would make the following data invalid, and there is nothing in spec currently that would justify that.
So Cases 4 and 5 fall into this category. I agree that the current language in the spec is fuzzy around this, although the original intent when introducing the timepoint
column was to specify all or no values - obviously in hindsight it would have been better if it said that explicitly.
Do we know if there are real datasets in the wild that are partially populating timepoint
like Case 4 and 5? If so, then I agree that the spec probably needs to be clarified before classifying this as an error. If not, IMHO it could go either way, and IMHO it should be an error.
@barbeau here are some interesting edge cases with partial implementations, with some real data samples.
stop_sequence | arrival_time | departure_time | timepoint |
---|---|---|---|
0 | 13:15:00 | 13:15:00 | |
1 | 13:30:00 | 13:30:00 | 0 |
2 | 13:40:00 | 13:40:00 | 0 |
3 | 13:50:00 | 13:50:00 | |
4 | 14:00:00 | 14:00:00 | 0 |
They seem to have assumed that since 1 or empty - Times are considered exact. (stop_times.timepoint
description), the empty values mean the times are exact. This data doesn't contain any 1's in the timepoint column.
stop_sequence | arrival_time | departure_time | timepoint |
---|---|---|---|
1 | 8:30:00 | 8:30:00 | 1 |
2 | 8:31:01 | 8:31:01 | 1 |
3 | |||
4 | |||
5 | |||
6 | |||
7 | |||
8 | |||
9 | |||
10 | 8:45:00 | 8:45:00 | 1 |
11 | 8:55:00 | 8:55:00 | 1 |
Here, they seem to have assumed that since times are Conditionally Required: Required for timepoint=1. Optional otherwise.(stop_times.departure_time
and stop_times.arrival_time
description), the empty values mean the times aren't exact. This data doesn't contain any 0's in the timepoint column.
trip_id | stop_sequence | arrival_time | departure_time | timepoint |
---|---|---|---|---|
1167963915753602:T1218:39:00 | 1 | 18:39:00 | 18:39:00 | 1 |
1167963915753602:T1218:39:00 | 2 | 18:42:27 | 18:42:27 | 0 |
1167963915753602:T1218:39:00 | 3 | 18:43:21 | 18:43:21 | 0 |
1167963915753602:T1218:39:00 | 4 | 18:44:36 | 18:44:36 | 0 |
1167963915753602:T1218:39:00 | 5 | 18:45:38 | 18:45:38 | 0 |
1167963915753602:T1218:39:00 | 6 | 18:47:34 | 18:47:34 | 0 |
1167963915753602:T1218:39:00 | 7 | 18:48:00 | 18:48:00 | 1 |
1167963915753602:T1218:39:00 | 8 | 18:50:37 | 18:50:37 | 0 |
1167963915753602:T1218:39:00 | 9 | 18:52:00 | 18:52:00 | 1 |
kcc_toCleElum_weekday_T01 | 1 | 5:30:00 | 5:30:00 | |
kcc_toCleElum_weekday_T01 | 2 | |||
kcc_toCleElum_weekday_T01 | 3 | |||
kcc_toCleElum_weekday_T01 | 4 | |||
kcc_toCleElum_weekday_T01 | 5 | |||
kcc_toCleElum_weekday_T01 | 6 | |||
kcc_toCleElum_weekday_T01 | 7 | 6:10:00 | 6:15:00 | |
kcc_toCleElum_weekday_T01 | 8 | |||
kcc_toCleElum_weekday_T01 | 9 | 6:20:00 | 6:20:00 |
This part of the data is at the end of the dataset. The timepoint values have been added for most of the lines, but after this point, all timepoint values are blank. Does this mean all times are exact? Or approximate? I believe neither, the values just haven't been added.
There are also datasets that added the timepoint column but all values are blank, and it seems obvious that it doesn't mean all values are timepoints, see the Via Rail Canada dataset below:
stop_sequence | arrival_time | departure_time | timepoint |
---|---|---|---|
1 | 3:30:00 | 3:30:00 | |
2 | |||
3 | 4:07:00 | 4:07:00 | |
4 | |||
5 | |||
6 | |||
7 | 5:01:00 | 5:01:00 | |
8 | 5:12:00 | 5:12:00 | |
9 | 5:20:00 | 5:20:00 | |
10 |
😱
Yikes - so there is real-world data all over the place. It would be nice to get this clarified in the spec ASAP so we can implement the original intent in the validator and get these datasets cleaned up. There seemed to be clear agreement on how the data was intended to be represented in the original proposal IIRC, the wording just apparently wasn't clear enough.
As agreed with @isabelle-dr and @barbeau, we will implement the following solutions:
timepoint = 1
AND times are missing.This rule will consider legacy datasets as valid, and the validator will behave as follows:
stop_sequence | arrival_time | departure_time |
---|---|---|
1 | 00:00:00 | 00:00:00 |
2 | ||
3 | ||
4 | 00:10:00 | 00:10:00 |
No notice
stop_sequence | arrival_time | departure_time | timepoint |
---|---|---|---|
1 | 00:00:00 | 00:00:00 | 1 |
2 | 00:02:00 | 00:02:00 | 0 |
3 | 00:08:00 | 00:08:00 | 0 |
4 | 00:10:00 | 00:10:00 | 1 |
No notice
stop_sequence | arrival_time | departure_time | timepoint |
---|---|---|---|
1 | 00:00:00 | 00:00:00 | 1 |
2 | 1 | ||
3 | 00:08:00 | 00:08:00 | 0 |
4 | 00:10:00 | 00:10:00 | 1 |
A notice will be generated for the 2nd row.
stop_sequence | arrival_time | departure_time | timepoint |
---|---|---|---|
1 | 00:00:00 | 00:00:00 | 1 |
2 | 00:02:00 | 00:02:00 | |
3 | 00:08:00 | 00:08:00 | |
4 | 00:10:00 | 00:10:00 | 1 |
No notice
stop_sequence | arrival_time | departure_time | timepoint |
---|---|---|---|
1 | 00:00:00 | 00:00:00 | 1 |
2 | |||
3 | |||
4 | 00:10:00 | 00:10:00 | 1 |
No notice
⚠️ Note that the notice will be considered as a WARNING
, the severity will be changed in a separate PR to better assess the potential breaking changes.
WARNING
) when stop_times.timepoint
value is missing for a single record.WARNING
) when column stop_times.timepoint
is missing (based on best practices)@mcplanner-zz could you provide an URL to download the dataset you mentionned in the issue description please? I tried download the dataset linked to the URL you provided but could not: the operation timed out with error code ERR_SOCKET_NOT_CONNECTED
.
Bug report
Describe the bug The StopTimeTimepointWithoutTimesNotice error is displayed when the stop_times.timepoint field is blank. This is unexpected. I would expect this to be interpreted as having no timepoints.
How we reproduce the bug Steps to reproduce the behaviour: This happened with this field: http://iportal.sacrt.com/GTFS/Unitrans/google_transit.zip
Expected behaviour If the timepoint field isn't being used, I expect an error about it's being missing.
Observed behaviour I was alerted that every timepoint was lacking an arrival and departure time.
Screenshots:
Environment versions Java 11, v2.0.0 validator, https://github.com/cal-itp/gtfs-validator