cal-itp / data-infra

Cal-ITP data infrastructure
https://docs.calitp.org/data-infra
GNU Affero General Public License v3.0
47 stars 12 forks source link

GTFS Schedule Validator is not throwing whitespace notices for SacRT service_id #945

Closed lauriemerrell closed 2 years ago

lauriemerrell commented 2 years ago

We identified an issue with the GTFS validator and the SacRT whitespace issue (#914).

I ran the version 2.0.0 GTFS validator locally and confirmed that it does not raise the LeadingOrTrailingWhitespacesNotice for this feed even though the issue is present (results attached for reference). So I think this is a bug in the validator and not an issue with our pipeline? sacrt_local_validation_results.json.zip

Impact: This bug means that our validation results are missing expected validation notices (i.e., there are validations that should have been triggered, but were not. This led to data integrity issues that were essentially occurring silently, with no validations being raised.) At this point we suspect it's a bug in the validator itself, and not our pipeline.

We would like to look into this further:

With more information, assuming we confirm it's an external validator issue, we could then submit an issue for the validator owners.

Other notes:

Interestingly, this query:

SELECT DISTINCT filename FROM cal-itp-data-infra.views.validation_fact_daily_feed_notices WHERE code = 'leading_or_trailing_whitespaces'

Indicates that the only files for which this error have been raised are stop_times.txt, stops.txt, trips.txt, and routes.txt (i.e., we've never had this error raised for calendar.txt or calendar_dates.txt which is where it's appearing for SacRT).

Originally posted by @lauriemerrell in https://github.com/cal-itp/data-infra/issues/914#issuecomment-1010063819

lauriemerrell commented 2 years ago

After investigating, it appears that this behavior is actually expected. This notice explicitly only checks for spaces in fields with quotes. See #946 for more discussion of related issues, but I am closing this ticket since this behavior seems technically expected.