MobilityData / gtfs-validator

Canonical GTFS Validator project for schedule (static) files.
https://gtfs-validator.mobilitydata.org/
Apache License 2.0
290 stars 101 forks source link

locations.geojson POC phase2: End-to-end partial support of json data #1810

Closed jcpitre closed 1 month ago

jcpitre commented 2 months ago

Summary:

Builds on the poc written by @davidgamez (#1805), related to locations.geojson

What it does:

Not covered:

To be done:

This is NOT ready to merge yet.

I modified an existing dataset that has flex data to create an error where the id of a feature in locations.geojson is the same as an id in stops.txt. browncounty-mn-us--flex-v2-broken.zip

Here is a capture of the sample notice in the report: image

Please make sure these boxes are checked before submitting your pull request - thanks!

github-actions[bot] commented 2 months ago

This contribution does not follow the conventions set by the Google Java style guide. Please run the following command line at the root of the project to fix formatting errors: ./gradlew goJF.

github-actions[bot] commented 2 months ago

📝 Acceptance Test Report

📋 Summary

✅ The rule acceptance has passed for commit 4f9e3dbdf516fab592a657442a02525a3edbc24d Download the full acceptance test report here (report will disappear after 90 days).

📊 Notices Comparison

New Errors (0 out of 1575 datasets, ~0%) ✅

No changes were detected due to the code change.

Dropped Errors (0 out of 1575 datasets, ~0%) ✅

No changes were detected due to the code change.

New Warnings (0 out of 1575 datasets, ~0%) ✅

No changes were detected due to the code change.

Dropped Warnings (0 out of 1575 datasets, ~0%) ✅

No changes were detected due to the code change.

🛡️ Corruption Check

0 out of 1575 sources (~0 %) are corrupted.

⏱️ Performance Assessment

📈 Validation Time

Assess the performance in terms of seconds taken for the validation process.

| Time Metric | Dataset ID | Reference (s) | Latest (s) | Difference (s) | |-----------------------------|-------------------|----------------|----------------|----------------| | Average | -- | 3.91 | 3.98 | ⬆️+0.08 | | Median | -- | 1.38 | 1.46 | ⬆️+0.07 | | Standard Deviation | -- | 11.28 | 11.21 | ⬇️-0.08 | | Minimum in References Reports | us-florida-citrus-county-transit-gtfs-630 | 0.51 | 0.59 | ⬆️+0.09 | | Maximum in Reference Reports | gb-unknown-uk-aggregate-feed-gtfs-2014 | 326.83 | 321.61 | ⬇️-5.22 | | Minimum in Latest Reports | us-california-catalina-express-gtfs-299 | 0.52 | 0.56 | ⬆️+0.03 | | Maximum in Latest Reports | gb-unknown-uk-aggregate-feed-gtfs-2014 | 326.83 | 321.61 | ⬇️-5.22 |
emmambd commented 2 months ago

@jcpitre One note re: the test notice of unique_location_id_violation. We won't actually be able to merge this one until the schema for location_groups.txt in #1749 is added, since the rule requires checking location id across all 3 files.

So there are two options:

1) Leave the notice is purely for testing purposes and then take it out of this PR. (Moving it to the other issue/PR for #1749) 2) Add the location_groups.txt schema within this PR, which seems to make the PR extra big.

jcpitre commented 2 months ago

@jcpitre One note re: the test notice of unique_location_id_violation. We won't actually be able to merge this one until the schema for location_groups.txt in #1749 is added, since the rule requires checking location id across all 3 files.

So there are two options:

  1. Leave the notice is purely for testing purposes and then take it out of this PR. (Moving it to the other issue/PR for Flex: Add location group schema and update stop time schema #1749)
  2. Add the location_groups.txt schema within this PR, which seems to make the PR extra big.

Yes, I put anything related to location_groups.txt in comments in the validator class so we hopefully don't forget. I thought though that there is some value in the validation that is done. After all, having the duplication of location_id in stops.txt and locations.geojson is an error in itself. And until all validators are implemented for Flex we will have false negatives, be it this one or other rules not yet implemented.

emmambd commented 2 months ago

@jcpitre I wouldn't want to release with it in the present state (meaning #1749 would block release, which it already does), but I'm okay with that. False positives are the main focus of this release, not negatives.

emmambd commented 2 months ago

@jcpitre I tested duplicate_key with this PR using flex.zip to try to generate a duplicate_key notice.

It worked! But I realized that we don't have the csvRowNumber lines...because it's not a csv.

Screenshot 2024-09-09 at 3 30 34 PM

Do we have a plan or need to discuss how to modify these rowNumber columns for generic notices that can affect either a geojson file or a csv file? cc @davidgamez

emmambd commented 2 months ago

@emmambd Try to break/generate other generic notices since we weren't expecting duplicate_key!

jcpitre commented 2 months ago

@emmambd This PR is ready for a bit of QA

github-actions[bot] commented 1 month ago

📝 Acceptance Test Report

📋 Summary

✅ The rule acceptance has passed for commit 78eee2770a3bc5f2ed7becc01caa261e6724ad2e Download the full acceptance test report here (report will disappear after 90 days).

📊 Notices Comparison

New Errors (0 out of 1588 datasets, ~0%) ✅

No changes were detected due to the code change.

Dropped Errors (0 out of 1588 datasets, ~0%) ✅

No changes were detected due to the code change.

New Warnings (0 out of 1588 datasets, ~0%) ✅

No changes were detected due to the code change.

Dropped Warnings (0 out of 1588 datasets, ~0%) ✅

No changes were detected due to the code change.

🛡️ Corruption Check

0 out of 1588 sources (~0 %) are corrupted.

⏱️ Performance Assessment

📈 Validation Time

Assess the performance in terms of seconds taken for the validation process.

| Time Metric | Dataset ID | Reference (s) | Latest (s) | Difference (s) | |-----------------------------|-------------------|----------------|----------------|----------------| | Average | -- | 4.06 | 4.12 | ⬆️+0.06 | | Median | -- | 1.43 | 1.47 | ⬆️+0.04 | | Standard Deviation | -- | 11.50 | 11.58 | ⬆️+0.08 | | Minimum in References Reports | us-oregon-hut-airport-shuttle-gtfs-635 | 0.51 | 0.63 | ⬆️+0.12 | | Maximum in Reference Reports | gb-unknown-uk-aggregate-feed-gtfs-2014 | 290.58 | 293.92 | ⬆️+3.34 | | Minimum in Latest Reports | us-oregon-high-desert-point-gtfs-636 | 0.54 | 0.51 | ⬇️-0.03 | | Maximum in Latest Reports | gb-unknown-uk-aggregate-feed-gtfs-2014 | 290.58 | 293.92 | ⬆️+3.34 |
github-actions[bot] commented 1 month ago

📝 Acceptance Test Report

📋 Summary

✅ The rule acceptance has passed for commit 3dcfc5c2246c5dd503b8ddb3cdbc530e6a019ab8 Download the full acceptance test report here (report will disappear after 90 days).

📊 Notices Comparison

New Errors (0 out of 1588 datasets, ~0%) ✅

No changes were detected due to the code change.

Dropped Errors (0 out of 1588 datasets, ~0%) ✅

No changes were detected due to the code change.

New Warnings (0 out of 1588 datasets, ~0%) ✅

No changes were detected due to the code change.

Dropped Warnings (0 out of 1588 datasets, ~0%) ✅

No changes were detected due to the code change.

🛡️ Corruption Check

0 out of 1588 sources (~0 %) are corrupted.

⏱️ Performance Assessment

📈 Validation Time

Assess the performance in terms of seconds taken for the validation process.

| Time Metric | Dataset ID | Reference (s) | Latest (s) | Difference (s) | |-----------------------------|-------------------|----------------|----------------|----------------| | Average | -- | 4.08 | 4.14 | ⬆️+0.06 | | Median | -- | 1.43 | 1.48 | ⬆️+0.05 | | Standard Deviation | -- | 11.90 | 11.75 | ⬇️-0.15 | | Minimum in References Reports | us-california-flex-v2-developer-test-feed-1-gtfs-1817 | 0.51 | 0.60 | ⬆️+0.09 | | Maximum in Reference Reports | gb-unknown-uk-aggregate-feed-gtfs-2014 | 316.26 | 300.69 | ⬇️-15.56 | | Minimum in Latest Reports | us-california-city-of-wasco-gtfs-1788 | 0.77 | 0.52 | ⬇️-0.25 | | Maximum in Latest Reports | gb-unknown-uk-aggregate-feed-gtfs-2014 | 316.26 | 300.69 | ⬇️-15.56 |