Closed mobilitydataio closed 1 month ago
Here is the effect of this spec change on the validator.
Current validator behavior
missing_recommended_column
is triggered if stop_times.txt
doesn't have the timepoint
column header.missing_timepoint_value
for records in stop_times.txt
with timepoint == ""
(i. e. no value provided) given that the timepoint
header is provided in the file. This is regardless of the presence or not of times in departure_time
and arrival_time
.New validator behavior
stop_times.txt
that have values for at least one of the departure_time
and arrival_time
fields (i. e. they have times defined) AND timepoint == ""
(i. e. no value provided).With this new behavior, we don't need to distinguish between datasets that have the column header and those that don't.
We could remove missing_recommended_column
altogether because I don't believe it's used for anything else, and modify missing_timepoint_value
. We could also use the more generic missing_recommended_field
, if we are able to generate it with a condition.
Expected effect on production data
The # datasets that trigger the new notice should be slightly smaller than the sum of the # datasets that trigger missing_recommended_column and missing_timepoint_value today (note that no dataset can trigger both, it's one or the other). This is because the datasets that have the following modeling will not trigger warnings anymore:
stop_sequence |
arrival_time | departure_time | timepoint |
---|---|---|---|
1 | 8:30:00 | 8:30:00 | 1 |
2 | 8:31:01 | 8:31:01 | 1 |
3 | |||
4 | |||
5 | |||
6 | |||
7 | |||
8 | |||
9 | |||
10 | 8:45:00 | 8:45:00 | 1 |
11 | 8:55:00 | 8:55:00 | 1 |
Additional note
Note that the logic that triggers the error stop_time_timepoint_without_times is unchanged: it's triggered for datasets that have no times defined for records with timepoint == 1
cc @tzujenchanmbd
Tasks:
missing_recommended_column
notice (need to validate no other fields/files are affected) missing_timepoint_value
functionality change to remove conditional on timepoint header presence AND add check if departure_time and arrival_time values exist
Requirements
Datasets that have all departure/arrival_time populated and that use a mix of specified and empty timepoint values (sample 1 https://github.com/google/transit/pull/474#issue-2339968072).
empty timepoint values used to inform that times are exact (example) -> should be replaced with values 1 empty timepoint values used to inform that times are approximate (example) -> should be replaced with values 0 Datasets with omitted times and timepoint values 1 for records with times (sample 2 https://github.com/google/transit/pull/474#issue-2339968072, example): they won't trigger a WARNING anymore.
Flex datasets that don't use departure/arrival_time and don't have timepoint defined: they won't trigger a WARNING anymore.
If this PR gets merged, we will make a modification of the canonical validator, as its logic is currently to give a WARNINGS in all cases of timepoint="".
Commit Nº 1
AUTHOR × DATE
MESSAGE