MobilityData / gtfs-validator

Canonical GTFS Validator project for schedule (static) files.
https://gtfs-validator.mobilitydata.org/
Apache License 2.0
290 stars 101 forks source link

feat: 1840 invalid characters #1892

Closed qcdyx closed 1 month ago

qcdyx commented 1 month ago

Summary:

Closes #1840

Expected behavior:

image

Please make sure these boxes are checked before submitting your pull request - thanks!

github-actions[bot] commented 1 month ago

📝 Acceptance Test Report

📋 Summary

✅ The rule acceptance has passed for commit aefbc02abe7a3ddcc1ace7ab0c1075b4ea1683f8 Download the full acceptance test report here (report will disappear after 90 days).

📊 Notices Comparison

New Errors (5 out of 1602 datasets, ~0%) ✅

Details of new errors due to code change, which is less than the provided threshold of 1%.

| Dataset | Notice Code | |---------|-------------| | br-rio-grande-do-sul-empresa-publica-de-transportes-e-circulacao-eptc-gtfs-7 | invalid_characters | | ch-unknown-lk2-gtfs-914 | invalid_characters | | mx-jalisco-secretaria-de-movilidad-del-estado-de-jalisco-gtfs-1926 | invalid_characters | | nl-unknown-allgo-keolis-gtfs-1077 | invalid_characters | | pt-setubal-carris-metropolitana-gtfs-1874 | invalid_characters |
Dropped Errors (1 out of 1602 datasets, ~0%) ✅

Details of new errors due to code change, which is less than the provided threshold of 1%.

| Dataset | Notice Code | |---------|-------------| | pt-setubal-carris-metropolitana-gtfs-1874 | trip_distance_exceeds_shape_distance |
New Warnings (0 out of 1602 datasets, ~0%) ✅

No changes were detected due to the code change.

Dropped Warnings (4 out of 1602 datasets, ~0%) ✅

Details of new errors due to code change, which is less than the provided threshold of 1%.

| Dataset | Notice Code | |---------|-------------| | ch-unknown-lk2-gtfs-914 | duplicate_route_name | | ch-unknown-lk2-gtfs-914 | fast_travel_between_consecutive_stops | | nl-unknown-allgo-keolis-gtfs-1077 | fast_travel_between_consecutive_stops | | ch-unknown-lk2-gtfs-914 | fast_travel_between_far_stops | | nl-unknown-allgo-keolis-gtfs-1077 | fast_travel_between_far_stops | | ch-unknown-lk2-gtfs-914 | missing_bike_allowance | | mx-jalisco-secretaria-de-movilidad-del-estado-de-jalisco-gtfs-1926 | missing_timepoint_value | | ch-unknown-lk2-gtfs-914 | stop_has_too_many_matches_for_shape | | nl-unknown-allgo-keolis-gtfs-1077 | stop_has_too_many_matches_for_shape | | pt-setubal-carris-metropolitana-gtfs-1874 | stop_has_too_many_matches_for_shape | | ch-unknown-lk2-gtfs-914 | stop_too_far_from_shape | | mx-jalisco-secretaria-de-movilidad-del-estado-de-jalisco-gtfs-1926 | stop_too_far_from_shape | | nl-unknown-allgo-keolis-gtfs-1077 | stop_too_far_from_shape | | pt-setubal-carris-metropolitana-gtfs-1874 | stop_too_far_from_shape | | ch-unknown-lk2-gtfs-914 | stop_too_far_from_shape_using_user_distance | | nl-unknown-allgo-keolis-gtfs-1077 | stop_too_far_from_shape_using_user_distance | | pt-setubal-carris-metropolitana-gtfs-1874 | stop_too_far_from_shape_using_user_distance | | mx-jalisco-secretaria-de-movilidad-del-estado-de-jalisco-gtfs-1926 | stop_without_stop_time | | nl-unknown-allgo-keolis-gtfs-1077 | stop_without_stop_time | | pt-setubal-carris-metropolitana-gtfs-1874 | stop_without_stop_time | | ch-unknown-lk2-gtfs-914 | stops_match_shape_out_of_order | | nl-unknown-allgo-keolis-gtfs-1077 | stops_match_shape_out_of_order | | pt-setubal-carris-metropolitana-gtfs-1874 | stops_match_shape_out_of_order | | pt-setubal-carris-metropolitana-gtfs-1874 | trip_distance_exceeds_shape_distance_below_threshold |

🛡️ Corruption Check

0 out of 1602 sources (~0 %) are corrupted.

⏱️ Performance Assessment

📈 Validation Time

Assess the performance in terms of seconds taken for the validation process.

| Time Metric | Dataset ID | Reference (s) | Latest (s) | Difference (s) | |-----------------------------|-------------------|----------------|----------------|----------------| | Average | -- | 4.02 | 4.06 | ⬆️+0.05 | | Median | -- | 1.40 | 1.46 | ⬆️+0.06 | | Standard Deviation | -- | 11.53 | 11.32 | ⬇️-0.21 | | Minimum in References Reports | us-california-flex-v2-developer-test-feed-3-gtfs-1819 | 0.50 | 0.73 | ⬆️+0.23 | | Maximum in Reference Reports | gb-unknown-uk-aggregate-feed-gtfs-2014 | 297.19 | 287.84 | ⬇️-9.36 | | Minimum in Latest Reports | us-california-catalina-express-gtfs-299 | 0.60 | 0.55 | ⬇️-0.06 | | Maximum in Latest Reports | gb-unknown-uk-aggregate-feed-gtfs-2014 | 297.19 | 287.84 | ⬇️-9.36 |
📜 Memory Consumption | Metric | Dataset ID | Reference (s) | Latest (s) | Difference (s) | |-----------------------------|-------------------|----------------|----------------|----------------| | Average | -- | 486.19 MiB | 480.76 MiB | ⬇️-5.44 MiB | | Median | -- | 245.95 MiB | 246.85 MiB | ⬆️+922.84 KiB | | Standard Deviation | -- | 877.41 MiB | 874.83 MiB | ⬇️-2.58 MiB | | Minimum in References Reports | us-oregon-hut-airport-shuttle-gtfs-635 | 34.05 MiB | 34.09 MiB | ⬆️+40.00 KiB | | Maximum in Reference Reports | gb-unknown-uk-aggregate-feed-gtfs-2014 | 9.96 GiB | 10.12 GiB | ⬆️+161.15 MiB | | Minimum in Latest Reports | us-virginia-jaunt-inc-gtfs-1324 | 34.06 MiB | 34.05 MiB | ⬇️-16.00 KiB | | Maximum in Latest Reports | gb-unknown-uk-aggregate-feed-gtfs-2014 | 9.96 GiB | 10.12 GiB | ⬆️+161.15 MiB |
emmambd commented 1 month ago

@tzujenchanmbd Curious about your thoughts on the acceptance tests. In cases where this is happening, it looks like it's because of how the producer is encoding accents (examples below).

Screenshot 2024-10-16 at 1 47 22 PM Screenshot 2024-10-16 at 1 47 07 PM Screenshot 2024-10-16 at 1 47 11 PM Screenshot 2024-10-16 at 1 47 17 PM

Is there some kind of guidance it would make sense for us to provide in the notice about how to encode these to prevent the issue from occurring?

github-actions[bot] commented 1 month ago

📝 Acceptance Test Report

📋 Summary

✅ The rule acceptance has passed for commit 477dbddad8ac9902d1d297fd0dd794e21883044e Download the full acceptance test report here (report will disappear after 90 days).

📊 Notices Comparison

New Errors (5 out of 1602 datasets, ~0%) ✅

Details of new errors due to code change, which is less than the provided threshold of 1%.

| Dataset | Notice Code | |---------|-------------| | br-rio-grande-do-sul-empresa-publica-de-transportes-e-circulacao-eptc-gtfs-7 | invalid_characters | | ch-unknown-lk2-gtfs-914 | invalid_characters | | mx-jalisco-secretaria-de-movilidad-del-estado-de-jalisco-gtfs-1926 | invalid_characters | | nl-unknown-allgo-keolis-gtfs-1077 | invalid_characters | | pt-setubal-carris-metropolitana-gtfs-1874 | invalid_characters |
Dropped Errors (1 out of 1602 datasets, ~0%) ✅

Details of new errors due to code change, which is less than the provided threshold of 1%.

| Dataset | Notice Code | |---------|-------------| | pt-setubal-carris-metropolitana-gtfs-1874 | trip_distance_exceeds_shape_distance |
New Warnings (0 out of 1602 datasets, ~0%) ✅

No changes were detected due to the code change.

Dropped Warnings (4 out of 1602 datasets, ~0%) ✅

Details of new errors due to code change, which is less than the provided threshold of 1%.

| Dataset | Notice Code | |---------|-------------| | ch-unknown-lk2-gtfs-914 | duplicate_route_name | | ch-unknown-lk2-gtfs-914 | fast_travel_between_consecutive_stops | | nl-unknown-allgo-keolis-gtfs-1077 | fast_travel_between_consecutive_stops | | ch-unknown-lk2-gtfs-914 | fast_travel_between_far_stops | | nl-unknown-allgo-keolis-gtfs-1077 | fast_travel_between_far_stops | | ch-unknown-lk2-gtfs-914 | missing_bike_allowance | | mx-jalisco-secretaria-de-movilidad-del-estado-de-jalisco-gtfs-1926 | missing_timepoint_value | | ch-unknown-lk2-gtfs-914 | stop_has_too_many_matches_for_shape | | nl-unknown-allgo-keolis-gtfs-1077 | stop_has_too_many_matches_for_shape | | pt-setubal-carris-metropolitana-gtfs-1874 | stop_has_too_many_matches_for_shape | | ch-unknown-lk2-gtfs-914 | stop_too_far_from_shape | | mx-jalisco-secretaria-de-movilidad-del-estado-de-jalisco-gtfs-1926 | stop_too_far_from_shape | | nl-unknown-allgo-keolis-gtfs-1077 | stop_too_far_from_shape | | pt-setubal-carris-metropolitana-gtfs-1874 | stop_too_far_from_shape | | ch-unknown-lk2-gtfs-914 | stop_too_far_from_shape_using_user_distance | | nl-unknown-allgo-keolis-gtfs-1077 | stop_too_far_from_shape_using_user_distance | | pt-setubal-carris-metropolitana-gtfs-1874 | stop_too_far_from_shape_using_user_distance | | mx-jalisco-secretaria-de-movilidad-del-estado-de-jalisco-gtfs-1926 | stop_without_stop_time | | nl-unknown-allgo-keolis-gtfs-1077 | stop_without_stop_time | | pt-setubal-carris-metropolitana-gtfs-1874 | stop_without_stop_time | | ch-unknown-lk2-gtfs-914 | stops_match_shape_out_of_order | | nl-unknown-allgo-keolis-gtfs-1077 | stops_match_shape_out_of_order | | pt-setubal-carris-metropolitana-gtfs-1874 | stops_match_shape_out_of_order | | pt-setubal-carris-metropolitana-gtfs-1874 | trip_distance_exceeds_shape_distance_below_threshold |

🛡️ Corruption Check

0 out of 1602 sources (~0 %) are corrupted.

⏱️ Performance Assessment

📈 Validation Time

Assess the performance in terms of seconds taken for the validation process.

| Time Metric | Dataset ID | Reference (s) | Latest (s) | Difference (s) | |-----------------------------|-------------------|----------------|----------------|----------------| | Average | -- | 4.02 | 4.14 | ⬆️+0.12 | | Median | -- | 1.38 | 1.43 | ⬆️+0.05 | | Standard Deviation | -- | 11.61 | 11.86 | ⬆️+0.25 | | Minimum in References Reports | us-california-flex-v2-developer-test-feed-3-gtfs-1819 | 0.51 | 0.62 | ⬆️+0.11 | | Maximum in Reference Reports | gb-unknown-uk-aggregate-feed-gtfs-2014 | 300.20 | 291.45 | ⬇️-8.76 | | Minimum in Latest Reports | ar-buenos-aires-subterraneos-de-buenos-aires-subte-gtfs-6 | 0.53 | 0.54 | ⬆️+0.01 | | Maximum in Latest Reports | gb-unknown-uk-aggregate-feed-gtfs-2014 | 300.20 | 291.45 | ⬇️-8.76 |
📜 Memory Consumption | Metric | Dataset ID | Reference (s) | Latest (s) | Difference (s) | |-----------------------------|-------------------|----------------|----------------|----------------| | Average | -- | 487.60 MiB | 476.31 MiB | ⬇️-11.30 MiB | | Median | -- | 248.03 MiB | 245.48 MiB | ⬇️-2.56 MiB | | Standard Deviation | -- | 863.99 MiB | 843.71 MiB | ⬇️-20.28 MiB | | Minimum in References Reports | us-california-flex-v2-developer-test-feed-1-gtfs-1817 | 34.05 MiB | 34.06 MiB | ⬆️+8.00 KiB | | Maximum in Reference Reports | gb-unknown-uk-aggregate-feed-gtfs-2014 | 10.15 GiB | 9.79 GiB | ⬇️-366.70 MiB | | Minimum in Latest Reports | tr-kocaeli-metro-izmir-gtfs-1824 | 34.07 MiB | 34.05 MiB | ⬇️-16.00 KiB | | Maximum in Latest Reports | gb-unknown-uk-aggregate-feed-gtfs-2014 | 10.15 GiB | 9.79 GiB | ⬇️-366.70 MiB |
tzujenchanmbd commented 1 month ago

Some examples of correct name in the screenshots:

Problem example on maps: https://maps.app.goo.gl/YqnS2Gj9goeWN8GT9

So it seems the issue usually happen on accented characters like "ó", "ö", "ï" in Western European languages.

Perhaps dev team can help confirm, but I guess this is probably because of encoding and decoding mismatch during the data production process. For example, if the text was originally saved using a specific encoding (e.g. ISO-8859-1 or Windows-1252, "legacy" encoding covering characters commonly used in Western European languages, such as accented characters (é, ñ, ü) and special symbols), but is then read using a different encoding (e.g. UTF-8). In this case characters outside the ASCII range (like accented characters) will probably not decode correctly, leading to errors like the replacement character (�).

davidgamez commented 1 month ago

Some examples of correct name in the screenshots:

  • Rotterdam, Selma Lagerlöfweg -> Rotterdam, Selma Lagerl�fweg
  • Rotterdam, Port Saïdstraat -> Rotterdam, Port Sa�dstraat
  • Estación Washington -> Estaci��n Washington

Problem example on maps: https://maps.app.goo.gl/YqnS2Gj9goeWN8GT9

So it seems the issue usually happen on accented characters like "ó", "ö", "ï" in Western European languages.

Perhaps dev team can help confirm, but I guess this is probably because of encoding and decoding mismatch during the data production process. For example, if the text was originally saved using a specific encoding (e.g. ISO-8859-1 or Windows-1252, "legacy" encoding covering characters commonly used in Western European languages, such as accented characters (é, ñ, ü) and special symbols), but is then read using a different encoding (e.g. UTF-8). In this case characters outside the ASCII range (like accented characters) will probably not decode correctly, leading to errors like the replacement character (�).

We assumed that the feeds are in UTF-8, replacement characters and other variations might be due to the fact that is not in proper UTF-8. The legacy Google validator replaced the non-UTF-8 compatible characters with the replacement character. Maybe they have this implemented somewhere in their data pipeline to guarantee that the text is properly rendered in the UI, even with some characters "replaced", legacy validator code

emmambd commented 1 month ago

Revisions after discussion with @tzujenchanmbd:

@davidgamez @qcdyx Is it feasible for us to parse the non-UTF-8 characters too? Ideally we could show them to the user in the notice table description, highlighted in bold so they know which characters are causing the problem.

I also want to note it looks like feeds with this error will have unparseable files that will mean notices are dropped, from the acceptance tests above.

davidgamez commented 1 month ago

Revisions after discussion with @tzujenchanmbd:

@davidgamez @qcdyx Is it feasible for us to parse the non-UTF-8 characters too? Ideally we could show them to the user in the notice table description, highlighted in bold so they know which characters are causing the problem.

  • Notice name: Invalid character (not plural, to match the style of our other notices)
  • Notice description: Description: This field contains invalid characters, marked in bold. Text must be encoded in UTF-8 in order to be valid. When reading text, use the same encoding that was used to save.

I also want to note it looks like feeds with this error will have unparseable files that will mean notices are dropped, from the acceptance tests above.

I suggest creating a different notice for non-UTF-8 text. There are two distinct situations: the first is the presence of a replacement character that is a valid UTF-8 character, and the second is the presence of an invalid UTF-8 character. I suspect that if we have a replacement character, it is due to a tool that transformed the feed and potentially replaced the invalid UTF-8 characters(or any other encoding) to UTF-8 or a different target encoding(violating the spec in this case).

davidgamez commented 1 month ago

Regarding the dropped notices, they are expected because of the severity of the notice(Error).

emmambd commented 1 month ago

@davidgamez Makes sense. How's this for a revised notice description then, so it's more suggestive and less prescriptive that the issue is non-UTF-8 encoding:

Notice name: Invalid character (not plural, to match the style of our other notices) Notice description: Description: This field contains invalid characters, such as the replacement character ("�"). Check that text was properly encoded in UTF-8 as required by GTFS.

davidgamez commented 1 month ago

@davidgamez Makes sense. How's this for a revised notice description then, so it's more suggestive and less prescriptive that the issue is non-UTF-8 encoding:

Notice name: Invalid character (not plural, to match the style of our other notices) Notice description: Description: This field contains invalid characters, such as the replacement character ("�"). Check that text was properly encoded in UTF-8 as required by GTFS.

The notice name and description make sense to me.

qcdyx commented 1 month ago

@davidgamez @emmambd image

github-actions[bot] commented 1 month ago

📝 Acceptance Test Report

📋 Summary

✅ The rule acceptance has passed for commit 2cf9ad281f4f98de2f3dfef34ac0a903ee8b2bc2 Download the full acceptance test report here (report will disappear after 90 days).

📊 Notices Comparison

New Errors (5 out of 1602 datasets, ~0%) ✅

Details of new errors due to code change, which is less than the provided threshold of 1%.

| Dataset | Notice Code | |---------|-------------| | br-rio-grande-do-sul-empresa-publica-de-transportes-e-circulacao-eptc-gtfs-7 | invalid_character | | ch-unknown-lk2-gtfs-914 | invalid_character | | mx-jalisco-secretaria-de-movilidad-del-estado-de-jalisco-gtfs-1926 | invalid_character | | nl-unknown-allgo-keolis-gtfs-1077 | invalid_character | | pt-setubal-carris-metropolitana-gtfs-1874 | invalid_character |
Dropped Errors (1 out of 1602 datasets, ~0%) ✅

Details of new errors due to code change, which is less than the provided threshold of 1%.

| Dataset | Notice Code | |---------|-------------| | pt-setubal-carris-metropolitana-gtfs-1874 | trip_distance_exceeds_shape_distance |
New Warnings (0 out of 1602 datasets, ~0%) ✅

No changes were detected due to the code change.

Dropped Warnings (4 out of 1602 datasets, ~0%) ✅

Details of new errors due to code change, which is less than the provided threshold of 1%.

| Dataset | Notice Code | |---------|-------------| | ch-unknown-lk2-gtfs-914 | duplicate_route_name | | ch-unknown-lk2-gtfs-914 | fast_travel_between_consecutive_stops | | nl-unknown-allgo-keolis-gtfs-1077 | fast_travel_between_consecutive_stops | | ch-unknown-lk2-gtfs-914 | fast_travel_between_far_stops | | nl-unknown-allgo-keolis-gtfs-1077 | fast_travel_between_far_stops | | ch-unknown-lk2-gtfs-914 | missing_bike_allowance | | mx-jalisco-secretaria-de-movilidad-del-estado-de-jalisco-gtfs-1926 | missing_timepoint_value | | ch-unknown-lk2-gtfs-914 | stop_has_too_many_matches_for_shape | | nl-unknown-allgo-keolis-gtfs-1077 | stop_has_too_many_matches_for_shape | | pt-setubal-carris-metropolitana-gtfs-1874 | stop_has_too_many_matches_for_shape | | ch-unknown-lk2-gtfs-914 | stop_too_far_from_shape | | mx-jalisco-secretaria-de-movilidad-del-estado-de-jalisco-gtfs-1926 | stop_too_far_from_shape | | nl-unknown-allgo-keolis-gtfs-1077 | stop_too_far_from_shape | | pt-setubal-carris-metropolitana-gtfs-1874 | stop_too_far_from_shape | | ch-unknown-lk2-gtfs-914 | stop_too_far_from_shape_using_user_distance | | nl-unknown-allgo-keolis-gtfs-1077 | stop_too_far_from_shape_using_user_distance | | pt-setubal-carris-metropolitana-gtfs-1874 | stop_too_far_from_shape_using_user_distance | | mx-jalisco-secretaria-de-movilidad-del-estado-de-jalisco-gtfs-1926 | stop_without_stop_time | | nl-unknown-allgo-keolis-gtfs-1077 | stop_without_stop_time | | pt-setubal-carris-metropolitana-gtfs-1874 | stop_without_stop_time | | ch-unknown-lk2-gtfs-914 | stops_match_shape_out_of_order | | nl-unknown-allgo-keolis-gtfs-1077 | stops_match_shape_out_of_order | | pt-setubal-carris-metropolitana-gtfs-1874 | stops_match_shape_out_of_order | | pt-setubal-carris-metropolitana-gtfs-1874 | trip_distance_exceeds_shape_distance_below_threshold |

🛡️ Corruption Check

0 out of 1602 sources (~0 %) are corrupted.

⏱️ Performance Assessment

📈 Validation Time

Assess the performance in terms of seconds taken for the validation process.

| Time Metric | Dataset ID | Reference (s) | Latest (s) | Difference (s) | |-----------------------------|-------------------|----------------|----------------|----------------| | Average | -- | 4.02 | 4.05 | ⬆️+0.03 | | Median | -- | 1.39 | 1.44 | ⬆️+0.05 | | Standard Deviation | -- | 11.59 | 11.40 | ⬇️-0.19 | | Minimum in References Reports | us-california-flex-v2-developer-test-feed-2-gtfs-1818 | 0.52 | 0.58 | ⬆️+0.06 | | Maximum in Reference Reports | gb-unknown-uk-aggregate-feed-gtfs-2014 | 301.68 | 289.98 | ⬇️-11.70 | | Minimum in Latest Reports | us-massachusetts-massachusetts-area-express-max-gtfs-431 | 0.54 | 0.54 | ⬇️-0.00 | | Maximum in Latest Reports | gb-unknown-uk-aggregate-feed-gtfs-2014 | 301.68 | 289.98 | ⬇️-11.70 |
📜 Memory Consumption | Metric | Dataset ID | Reference (s) | Latest (s) | Difference (s) | |-----------------------------|-------------------|----------------|----------------|----------------| | Average | -- | 494.48 MiB | 479.71 MiB | ⬇️-14.76 MiB | | Median | -- | 247.23 MiB | 245.94 MiB | ⬇️-1.29 MiB | | Standard Deviation | -- | 894.05 MiB | 850.47 MiB | ⬇️-43.58 MiB | | Minimum in References Reports | ph-unknown-hm-transport-inc-and-robinsons-malls-gtfs-1105 | 34.05 MiB | 34.07 MiB | ⬆️+24.00 KiB | | Maximum in Reference Reports | gb-unknown-uk-aggregate-feed-gtfs-2014 | 10.18 GiB | 10.04 GiB | ⬇️-146.33 MiB | | Minimum in Latest Reports | us-oregon-high-desert-point-gtfs-636 | 34.05 MiB | 34.05 MiB | ⬇️-8.00 KiB | | Maximum in Latest Reports | gb-unknown-uk-aggregate-feed-gtfs-2014 | 10.18 GiB | 10.04 GiB | ⬇️-146.33 MiB |
github-actions[bot] commented 1 month ago

📝 Acceptance Test Report

📋 Summary

✅ The rule acceptance has passed for commit 738de05579ef57c740d35b49fccbc2a09394470b Download the full acceptance test report here (report will disappear after 90 days).

📊 Notices Comparison

New Errors (5 out of 1602 datasets, ~0%) ✅

Details of new errors due to code change, which is less than the provided threshold of 1%.

| Dataset | Notice Code | |---------|-------------| | br-rio-grande-do-sul-empresa-publica-de-transportes-e-circulacao-eptc-gtfs-7 | invalid_character | | ch-unknown-lk2-gtfs-914 | invalid_character | | mx-jalisco-secretaria-de-movilidad-del-estado-de-jalisco-gtfs-1926 | invalid_character | | nl-unknown-allgo-keolis-gtfs-1077 | invalid_character | | pt-setubal-carris-metropolitana-gtfs-1874 | invalid_character |
Dropped Errors (1 out of 1602 datasets, ~0%) ✅

Details of new errors due to code change, which is less than the provided threshold of 1%.

| Dataset | Notice Code | |---------|-------------| | pt-setubal-carris-metropolitana-gtfs-1874 | trip_distance_exceeds_shape_distance |
New Warnings (0 out of 1602 datasets, ~0%) ✅

No changes were detected due to the code change.

Dropped Warnings (4 out of 1602 datasets, ~0%) ✅

Details of new errors due to code change, which is less than the provided threshold of 1%.

| Dataset | Notice Code | |---------|-------------| | ch-unknown-lk2-gtfs-914 | duplicate_route_name | | ch-unknown-lk2-gtfs-914 | fast_travel_between_consecutive_stops | | nl-unknown-allgo-keolis-gtfs-1077 | fast_travel_between_consecutive_stops | | ch-unknown-lk2-gtfs-914 | fast_travel_between_far_stops | | nl-unknown-allgo-keolis-gtfs-1077 | fast_travel_between_far_stops | | ch-unknown-lk2-gtfs-914 | missing_bike_allowance | | mx-jalisco-secretaria-de-movilidad-del-estado-de-jalisco-gtfs-1926 | missing_timepoint_value | | ch-unknown-lk2-gtfs-914 | stop_has_too_many_matches_for_shape | | nl-unknown-allgo-keolis-gtfs-1077 | stop_has_too_many_matches_for_shape | | pt-setubal-carris-metropolitana-gtfs-1874 | stop_has_too_many_matches_for_shape | | ch-unknown-lk2-gtfs-914 | stop_too_far_from_shape | | mx-jalisco-secretaria-de-movilidad-del-estado-de-jalisco-gtfs-1926 | stop_too_far_from_shape | | nl-unknown-allgo-keolis-gtfs-1077 | stop_too_far_from_shape | | pt-setubal-carris-metropolitana-gtfs-1874 | stop_too_far_from_shape | | ch-unknown-lk2-gtfs-914 | stop_too_far_from_shape_using_user_distance | | nl-unknown-allgo-keolis-gtfs-1077 | stop_too_far_from_shape_using_user_distance | | pt-setubal-carris-metropolitana-gtfs-1874 | stop_too_far_from_shape_using_user_distance | | mx-jalisco-secretaria-de-movilidad-del-estado-de-jalisco-gtfs-1926 | stop_without_stop_time | | nl-unknown-allgo-keolis-gtfs-1077 | stop_without_stop_time | | pt-setubal-carris-metropolitana-gtfs-1874 | stop_without_stop_time | | ch-unknown-lk2-gtfs-914 | stops_match_shape_out_of_order | | nl-unknown-allgo-keolis-gtfs-1077 | stops_match_shape_out_of_order | | pt-setubal-carris-metropolitana-gtfs-1874 | stops_match_shape_out_of_order | | pt-setubal-carris-metropolitana-gtfs-1874 | trip_distance_exceeds_shape_distance_below_threshold |

🛡️ Corruption Check

0 out of 1602 sources (~0 %) are corrupted.

⏱️ Performance Assessment

📈 Validation Time

Assess the performance in terms of seconds taken for the validation process.

| Time Metric | Dataset ID | Reference (s) | Latest (s) | Difference (s) | |-----------------------------|-------------------|----------------|----------------|----------------| | Average | -- | 3.98 | 4.03 | ⬆️+0.04 | | Median | -- | 1.39 | 1.43 | ⬆️+0.04 | | Standard Deviation | -- | 11.35 | 11.20 | ⬇️-0.15 | | Minimum in References Reports | us-california-catalina-express-gtfs-299 | 0.54 | 0.64 | ⬆️+0.10 | | Maximum in Reference Reports | gb-unknown-uk-aggregate-feed-gtfs-2014 | 292.13 | 290.70 | ⬇️-1.42 | | Minimum in Latest Reports | us-california-santa-clarita-transit-gtfs-812 | 0.63 | 0.55 | ⬇️-0.09 | | Maximum in Latest Reports | gb-unknown-uk-aggregate-feed-gtfs-2014 | 292.13 | 290.70 | ⬇️-1.42 |
📜 Memory Consumption | Metric | Dataset ID | Reference (s) | Latest (s) | Difference (s) | |-----------------------------|-------------------|----------------|----------------|----------------| | Average | -- | 486.24 MiB | 476.25 MiB | ⬇️-9.99 MiB | | Median | -- | 245.94 MiB | 245.94 MiB | ⬆️+3.99 KiB | | Standard Deviation | -- | 884.90 MiB | 827.25 MiB | ⬇️-57.65 MiB | | Minimum in References Reports | us-oregon-hut-airport-shuttle-gtfs-635 | 34.05 MiB | 34.05 MiB | ⬇️-8.00 KiB | | Maximum in Reference Reports | gb-unknown-uk-aggregate-feed-gtfs-2014 | 10.24 GiB | 9.84 GiB | ⬇️-400.14 MiB | | Minimum in Latest Reports | tr-kocaeli-metro-izmir-gtfs-1824 | 34.07 MiB | 34.05 MiB | ⬇️-24.00 KiB | | Maximum in Latest Reports | gb-unknown-uk-aggregate-feed-gtfs-2014 | 10.24 GiB | 9.84 GiB | ⬇️-400.14 MiB |
emmambd commented 1 month ago

LGTM!

github-actions[bot] commented 1 month ago

📝 Acceptance Test Report

📋 Summary

✅ The rule acceptance has passed for commit 44108c32a0a630cc263909bf2cc9c27e1cf02224 Download the full acceptance test report here (report will disappear after 90 days).

📊 Notices Comparison

New Errors (5 out of 1602 datasets, ~0%) ✅

Details of new errors due to code change, which is less than the provided threshold of 1%.

| Dataset | Notice Code | |---------|-------------| | br-rio-grande-do-sul-empresa-publica-de-transportes-e-circulacao-eptc-gtfs-7 | invalid_character | | ch-unknown-lk2-gtfs-914 | invalid_character | | mx-jalisco-secretaria-de-movilidad-del-estado-de-jalisco-gtfs-1926 | invalid_character | | nl-unknown-allgo-keolis-gtfs-1077 | invalid_character | | pt-setubal-carris-metropolitana-gtfs-1874 | invalid_character |
Dropped Errors (1 out of 1602 datasets, ~0%) ✅

Details of new errors due to code change, which is less than the provided threshold of 1%.

| Dataset | Notice Code | |---------|-------------| | pt-setubal-carris-metropolitana-gtfs-1874 | trip_distance_exceeds_shape_distance |
New Warnings (0 out of 1602 datasets, ~0%) ✅

No changes were detected due to the code change.

Dropped Warnings (4 out of 1602 datasets, ~0%) ✅

Details of new errors due to code change, which is less than the provided threshold of 1%.

| Dataset | Notice Code | |---------|-------------| | ch-unknown-lk2-gtfs-914 | duplicate_route_name | | ch-unknown-lk2-gtfs-914 | fast_travel_between_consecutive_stops | | nl-unknown-allgo-keolis-gtfs-1077 | fast_travel_between_consecutive_stops | | ch-unknown-lk2-gtfs-914 | fast_travel_between_far_stops | | nl-unknown-allgo-keolis-gtfs-1077 | fast_travel_between_far_stops | | ch-unknown-lk2-gtfs-914 | missing_bike_allowance | | mx-jalisco-secretaria-de-movilidad-del-estado-de-jalisco-gtfs-1926 | missing_timepoint_value | | ch-unknown-lk2-gtfs-914 | stop_has_too_many_matches_for_shape | | nl-unknown-allgo-keolis-gtfs-1077 | stop_has_too_many_matches_for_shape | | pt-setubal-carris-metropolitana-gtfs-1874 | stop_has_too_many_matches_for_shape | | ch-unknown-lk2-gtfs-914 | stop_too_far_from_shape | | mx-jalisco-secretaria-de-movilidad-del-estado-de-jalisco-gtfs-1926 | stop_too_far_from_shape | | nl-unknown-allgo-keolis-gtfs-1077 | stop_too_far_from_shape | | pt-setubal-carris-metropolitana-gtfs-1874 | stop_too_far_from_shape | | ch-unknown-lk2-gtfs-914 | stop_too_far_from_shape_using_user_distance | | nl-unknown-allgo-keolis-gtfs-1077 | stop_too_far_from_shape_using_user_distance | | pt-setubal-carris-metropolitana-gtfs-1874 | stop_too_far_from_shape_using_user_distance | | mx-jalisco-secretaria-de-movilidad-del-estado-de-jalisco-gtfs-1926 | stop_without_stop_time | | nl-unknown-allgo-keolis-gtfs-1077 | stop_without_stop_time | | pt-setubal-carris-metropolitana-gtfs-1874 | stop_without_stop_time | | ch-unknown-lk2-gtfs-914 | stops_match_shape_out_of_order | | nl-unknown-allgo-keolis-gtfs-1077 | stops_match_shape_out_of_order | | pt-setubal-carris-metropolitana-gtfs-1874 | stops_match_shape_out_of_order | | pt-setubal-carris-metropolitana-gtfs-1874 | trip_distance_exceeds_shape_distance_below_threshold |

🛡️ Corruption Check

0 out of 1602 sources (~0 %) are corrupted.

⏱️ Performance Assessment

📈 Validation Time

Assess the performance in terms of seconds taken for the validation process.

| Time Metric | Dataset ID | Reference (s) | Latest (s) | Difference (s) | |-----------------------------|-------------------|----------------|----------------|----------------| | Average | -- | 4.00 | 4.03 | ⬆️+0.04 | | Median | -- | 1.39 | 1.43 | ⬆️+0.04 | | Standard Deviation | -- | 11.54 | 11.37 | ⬇️-0.18 | | Minimum in References Reports | us-massachusetts-massachusetts-area-express-max-gtfs-431 | 0.51 | 0.64 | ⬆️+0.13 | | Maximum in Reference Reports | gb-unknown-uk-aggregate-feed-gtfs-2014 | 300.37 | 297.15 | ⬇️-3.22 | | Minimum in Latest Reports | tr-kocaeli-metro-izmir-gtfs-1824 | 0.56 | 0.54 | ⬇️-0.02 | | Maximum in Latest Reports | gb-unknown-uk-aggregate-feed-gtfs-2014 | 300.37 | 297.15 | ⬇️-3.22 |
📜 Memory Consumption | Metric | Dataset ID | Reference (s) | Latest (s) | Difference (s) | |-----------------------------|-------------------|----------------|----------------|----------------| | Average | -- | 478.72 MiB | 478.05 MiB | ⬇️-682.76 KiB | | Median | -- | 246.71 MiB | 246.50 MiB | ⬇️-222.88 KiB | | Standard Deviation | -- | 852.26 MiB | 862.92 MiB | ⬆️+10.66 MiB | | Minimum in References Reports | us-massachusetts-massachusetts-area-express-max-gtfs-431 | 34.49 MiB | 34.49 MiB | ⬇️0 bytes | | Maximum in Reference Reports | gb-unknown-uk-aggregate-feed-gtfs-2014 | 10.21 GiB | 10.08 GiB | ⬇️-139.33 MiB | | Minimum in Latest Reports | us-california-flex-v2-developer-test-feed-3-gtfs-1819 | 34.50 MiB | 34.48 MiB | ⬇️-24.00 KiB | | Maximum in Latest Reports | gb-unknown-uk-aggregate-feed-gtfs-2014 | 10.21 GiB | 10.08 GiB | ⬇️-139.33 MiB |
github-actions[bot] commented 1 month ago

📝 Acceptance Test Report

📋 Summary

✅ The rule acceptance has passed for commit 3963b995d0f9858d83c30848479581cd3214c7a7 Download the full acceptance test report here (report will disappear after 90 days).

📊 Notices Comparison

New Errors (5 out of 1602 datasets, ~0%) ✅

Details of new errors due to code change, which is less than the provided threshold of 1%.

| Dataset | Notice Code | |---------|-------------| | br-rio-grande-do-sul-empresa-publica-de-transportes-e-circulacao-eptc-gtfs-7 | invalid_character | | ch-unknown-lk2-gtfs-914 | invalid_character | | mx-jalisco-secretaria-de-movilidad-del-estado-de-jalisco-gtfs-1926 | invalid_character | | nl-unknown-allgo-keolis-gtfs-1077 | invalid_character | | pt-setubal-carris-metropolitana-gtfs-1874 | invalid_character |
Dropped Errors (1 out of 1602 datasets, ~0%) ✅

Details of new errors due to code change, which is less than the provided threshold of 1%.

| Dataset | Notice Code | |---------|-------------| | pt-setubal-carris-metropolitana-gtfs-1874 | trip_distance_exceeds_shape_distance |
New Warnings (0 out of 1602 datasets, ~0%) ✅

No changes were detected due to the code change.

Dropped Warnings (4 out of 1602 datasets, ~0%) ✅

Details of new errors due to code change, which is less than the provided threshold of 1%.

| Dataset | Notice Code | |---------|-------------| | ch-unknown-lk2-gtfs-914 | duplicate_route_name | | ch-unknown-lk2-gtfs-914 | fast_travel_between_consecutive_stops | | nl-unknown-allgo-keolis-gtfs-1077 | fast_travel_between_consecutive_stops | | ch-unknown-lk2-gtfs-914 | fast_travel_between_far_stops | | nl-unknown-allgo-keolis-gtfs-1077 | fast_travel_between_far_stops | | ch-unknown-lk2-gtfs-914 | missing_bike_allowance | | mx-jalisco-secretaria-de-movilidad-del-estado-de-jalisco-gtfs-1926 | missing_timepoint_value | | ch-unknown-lk2-gtfs-914 | stop_has_too_many_matches_for_shape | | nl-unknown-allgo-keolis-gtfs-1077 | stop_has_too_many_matches_for_shape | | pt-setubal-carris-metropolitana-gtfs-1874 | stop_has_too_many_matches_for_shape | | ch-unknown-lk2-gtfs-914 | stop_too_far_from_shape | | mx-jalisco-secretaria-de-movilidad-del-estado-de-jalisco-gtfs-1926 | stop_too_far_from_shape | | nl-unknown-allgo-keolis-gtfs-1077 | stop_too_far_from_shape | | pt-setubal-carris-metropolitana-gtfs-1874 | stop_too_far_from_shape | | ch-unknown-lk2-gtfs-914 | stop_too_far_from_shape_using_user_distance | | nl-unknown-allgo-keolis-gtfs-1077 | stop_too_far_from_shape_using_user_distance | | pt-setubal-carris-metropolitana-gtfs-1874 | stop_too_far_from_shape_using_user_distance | | mx-jalisco-secretaria-de-movilidad-del-estado-de-jalisco-gtfs-1926 | stop_without_stop_time | | nl-unknown-allgo-keolis-gtfs-1077 | stop_without_stop_time | | pt-setubal-carris-metropolitana-gtfs-1874 | stop_without_stop_time | | ch-unknown-lk2-gtfs-914 | stops_match_shape_out_of_order | | nl-unknown-allgo-keolis-gtfs-1077 | stops_match_shape_out_of_order | | pt-setubal-carris-metropolitana-gtfs-1874 | stops_match_shape_out_of_order | | pt-setubal-carris-metropolitana-gtfs-1874 | trip_distance_exceeds_shape_distance_below_threshold |

🛡️ Corruption Check

0 out of 1602 sources (~0 %) are corrupted.

⏱️ Performance Assessment

📈 Validation Time

Assess the performance in terms of seconds taken for the validation process.

| Time Metric | Dataset ID | Reference (s) | Latest (s) | Difference (s) | |-----------------------------|-------------------|----------------|----------------|----------------| | Average | -- | 4.08 | 4.07 | ⬇️-0.01 | | Median | -- | 1.41 | 1.42 | ⬆️+0.02 | | Standard Deviation | -- | 11.78 | 11.57 | ⬇️-0.21 | | Minimum in References Reports | us-massachusetts-massachusetts-area-express-max-gtfs-431 | 0.50 | 0.54 | ⬆️+0.03 | | Maximum in Reference Reports | gb-unknown-uk-aggregate-feed-gtfs-2014 | 302.57 | 298.73 | ⬇️-3.83 | | Minimum in Latest Reports | us-massachusetts-massachusetts-area-express-max-gtfs-431 | 0.50 | 0.54 | ⬆️+0.03 | | Maximum in Latest Reports | gb-unknown-uk-aggregate-feed-gtfs-2014 | 302.57 | 298.73 | ⬇️-3.83 |
📜 Memory Consumption | Metric | Dataset ID | Reference (s) | Latest (s) | Difference (s) | |-----------------------------|-------------------|----------------|----------------|----------------| | Average | -- | 475.68 MiB | 461.37 MiB | ⬇️-14.31 MiB | | Median | -- | 248.48 MiB | 244.63 MiB | ⬇️-3.85 MiB | | Standard Deviation | -- | 828.31 MiB | 783.94 MiB | ⬇️-44.37 MiB | | Minimum in References Reports | us-massachusetts-massachusetts-area-express-max-gtfs-431 | 34.48 MiB | 34.50 MiB | ⬆️+24.00 KiB | | Maximum in Reference Reports | gb-unknown-uk-aggregate-feed-gtfs-2014 | 10.20 GiB | 10.11 GiB | ⬇️-99.86 MiB | | Minimum in Latest Reports | us-michigan-detroit-people-mover-gtfs-417 | 34.49 MiB | 34.48 MiB | ⬇️-8.00 KiB | | Maximum in Latest Reports | gb-unknown-uk-aggregate-feed-gtfs-2014 | 10.20 GiB | 10.11 GiB | ⬇️-99.86 MiB |