Closed emmambd closed 2 months ago
In code, we encountered UNPARSABLE_ROWS due to validation errors while processing the rows of GTFS files. For example, stop_times.txt had errors such as unknown_column and missing_required_field. For agency.txt, there's invalid_timezone and invalid_url ERROR.
Based on the investigation on #1770 , it's the missing_required_field, invalid_url, and invalid_timezone that lead to validation errors and make a GTFS file unparsable.
Moving @qcdyx findings from #1770 here:
It's the missing_required_field 'stop_id' that leads to a validation error, which makes stop_times.txt have a status of UNPARSABLE_ROWS. added a 'UNKNOWN_COLUMN' to stop_times.txt of browncounty-mn-us--flex-v2 dataset, run GTFS validator, no UNPARSABLE_ROWS for stop_times.txt.
We're only planning to modify the logic of missing_required_field
for Flex feeds, not invalid_url
or invalid_timezone
. I think we'd proceed by continuing the work in #1721 and see how often these feeds fail to parse files by completing #1775 cc @davidgamez @qcdyx
Moving @qcdyx findings from #1770 here:
It's the missing_required_field 'stop_id' that leads to a validation error, which makes stop_times.txt have a status of UNPARSABLE_ROWS. added a 'UNKNOWN_COLUMN' to stop_times.txt of browncounty-mn-us--flex-v2 dataset, run GTFS validator, no UNPARSABLE_ROWS for stop_times.txt.
We're only planning to modify the logic of
missing_required_field
for Flex feeds, notinvalid_url
orinvalid_timezone
. I think we'd proceed by continuing the work in #1721 and see how often these feeds fail to parse by completing #1775 cc @davidgamez @qcdyx
For clarification, when an unparsable error is triggered, it only affects single file validators for the referred file. In this case only agency.txt validators are affected.
From the findings from #1749, it looks like this is not an issue now that missing_required_field
has been modified. cc @jcpitre
What's the problem?
Out of the 4 Flex feeds that we have for testing purposes for #1721, 3 have failed to run through the validator without parsing issues.
I took a look at 51 Flex v2 feeds, including ones that don't conform to the official spec yet, for the sake of trying to better understand this problem. 50% fail to fully parse, and all but 1 of the feeds that failed have an issue with parsing stop_times.txt.
Outstanding questions
This is a critical set of questions to answer before we pursue more work on #1721