MobilityData / gtfs-validator

Canonical GTFS Validator project for schedule (static) files.
https://gtfs-validator.mobilitydata.org/
Apache License 2.0
275 stars 100 forks source link

Implement calendar has active days (GTFS rule) #134

Open maximearmstrong opened 4 years ago

maximearmstrong commented 4 years ago

Is your feature request related to a problem? Please describe. A calendar/service must be at least once a week. This is a GTFS rule implemented in Google Python validator and featured in Google Type Error as TYPE_CALENDAR_HAS_NO_ACTIVE_DAYS_OF_WEEK.

Describe the solution you'd like Actual Google GTFS validator behaviour : verifies if service happens at least once a week, minimum 1 time for each sunday-saturday timeframe.

Describe alternatives you've considered

Additional context Line 41 in Error support priorities https://docs.google.com/spreadsheets/d/1vqe6wq7ctqk1EhYkgtZ0_TbcQ91vccfs2daSjn20BLE/edit#gid=0

lionel-nj commented 4 years ago

@barbeau Do you think that this rule should be checked before entity is created, or be the subject of a new use case?

To keep consistency, it would be a new use case, however this would imply that invalid Calendar entities will be present in the data repository; which could complicate further cross verification ( I do not have precise example in mind, but I imagine that this is the kind of error that could prevent us from effectively checking rules on services and calendars).

barbeau commented 4 years ago

Looks like this is the ValidateHasServiceAtLeastOnceAWeek rule in the Python validator?

https://github.com/google/transitfeed/blob/master/transitfeed/serviceperiod.py#L300

From a quick read of the Python code, I think this is a cross check of both calendar.txt and calendar_dates.txt - some pseudocode:

IF calendar.txt service_id entry has monday AND tuesday AND wednesday AND thursday AND friday AND saturday AND sunday = 0
AND
there is no calendar_dates.txt entry for that service_id that has exception_type=1
THEN output an error

I'd welcome other interpretations of the Python code.

If I'm right, then you need to have read in all the calendar.txt and calendar_dates.txt records before this can be executed.

lionel-nj commented 4 years ago

Indeed, I missed the fact that this is linked to calendar.txt. In this case, we should have a use case to verify that. After going through our rule prioritization spreadsheet, I do not identity other rules where Calendar entities would be used in cross-validation.

So I guess that for now it could be okay to have "invalid" Calendar entities inside our data repository.

Also, do we have the option to remove an "invalid" Calendar entity from the data repository? If so, these could be removed if the related use case is executed very early in the process. Cross validation would be executed on the remaining "valid" Calendar entities.

barbeau commented 4 years ago

Also, do we have the option to remove an "invalid" Calendar entity from the data repository? If so, these could be removed if the related use case is executed very early in the process. Cross validation would be executed on the remaining "valid" Calendar entities.

In our validation rules we're moving mainly into semantic validation and out of data type validation. In this process, there will be a number of things that we can't validate looking at one file, but will become obviously wrong when we look at combining multiple files and the semantic meaning of GTFS.

I would suggest for any rules that can't be run on a single file (which strongly suggests semantic validation), we should leave the records in the repository if they generate errors. Otherwise we are going to have a lot of downstream effects - for example, after we semantically validate calendar and calendar_dates and we find a bad record, do we then remove all records related to those service_ids from the repo? There will also be interdependencies between the rules and files that will be hard to resolve if we remove records and try to continue with validation.

isabelle-dr commented 1 year ago

We usually don't add rules that are not clearly specified in the spec or best practices (this is the case here), but we make exceptions for what the community thinks is valuable to check (point_near_origin, fast_travel_between_consecutive_stops, unused_trip, just to name a few). This is done on a case-by-case basis, by discussing with the community. We are in favor of having the spec modified first before adding the check in the validator.

emmambd commented 1 year ago

This has been flagged as a valuable check by several community members. Although it is not explicit in the spec, I am wondering if this is implicit from the Remove old services (expired calendars) from the feed best practice and should be a warning without being added directly to the spec.

isabelle-dr commented 9 months ago

@emmambd IMO this issue is related to the "unused" type of notices:

What do you think should be the rationale behind having these types of checks as INFO or WARNING (or something else)?