MobilityData / gtfs-validator

Canonical GTFS Validator project for schedule (static) files.
https://gtfs-validator.mobilitydata.org/
Apache License 2.0
286 stars 100 forks source link

Performance issues with trip_distance_exceeds_shape_distance #1611

Closed emmambd closed 7 months ago

emmambd commented 10 months ago

Describe the bug

Upon generating analytics for the 4.2 release, we found that several datasets failed to run, and we assume that's because of the new trip_distance_exceeds_shape_distance error.

These datasets failed with the message The configured memory limit was reached:

  1. us-oregon-trimet-portland-streetcar-gtfs-247
  2. us-washington-sound-transit-metro-transit-city-of-seattle-king-county-metro-gtfs-267
  3. us-illinois-chicago-transit-authority-cta-gtfs-389
  4. us-new-jersey-new-jersey-transit-nj-transit-gtfs-508
  5. at-wien-wiener-lokalbahnen-wlb-gtfs-648
  6. be-vlaams-gewest-de-lijn-gtfs-684
  7. ca-alberta-edmonton-transit-system-gtfs-714
  8. ca-ontario-toronto-transit-commission-gtfs-732
  9. ca-quebec-reseau-de-transport-de-la-capitale-gtfs-757
  10. de-berlin-verkehrsverbund-berlin-brandenburg-gtfs-782
  11. Ie-dublin-dublin-bus-gtfs-947
  12. fr-auvergne-rhone-alpes-cars-region-auvergne-rhone-alpes-transisere-gtfs-985
  13. es-madrid-cercanias-madrid-gtfs-993
  14. nl-unknown-allgo-keolis-gtfs-1077
  15. ee-unknown-abuss-ou-gtfs-1095
  16. gb-unknown-transport-for-greater-manchester-arriva-in-the-north-west-gtfs-1103
  17. fi-unknown-porvoon-museorautatie-gtfs-1102
  18. be-unknown-societe-regionale-wallonne-du-transport-gtfs-1212
  19. it-lombardia-agenzia-mobilita-ambiente-territorio-gtfs-1231
  20. gr-attiki-athens-urban-transport-organisation-organismos-astikon-sugkoinonion-oasa-gtfs-1228
  21. ru-sankt-peterburg-peterburgskii-metropoliten-petersburg-metro-gtfs-1186
  22. tw-unknown-taichung-gtfs-1277
  23. dk-unknown-rejseplanen-gtfs-1292
  24. gb-unknown-chiltern-railways-gtfs-1311
  25. pt-lisboa-carris-metropolitana-gtfs-1873

These datasets failed with the message Timeout of 1800 seconds exceeded:

  1. us-unknown-amtrak-gtfs-11
  2. tn-unknown-uabs-banlieue-sahel-gtfs-1016

This dataset failed with the message connection broken:

  1. us-minnesota-metro-transit-metro-transit-met-council-maple-grove-plymouth-southwest-transit-airport-mac-university-of-minnesota-catch-the-link-gtfs-205

We should further investigate and identify possible performance improvements to this notice. Relates to #1589

Steps/Code to Reproduce

Expected Results

Actual Results

Feed fails to parse.

Screenshots

No response

Files used

No response

Validator version

4.2

Operating system

MacOS

Java version

No response

Additional notes

No response

emmambd commented 8 months ago

In the memory reduction design & analysis document, it's mentioned that there's a significant amount of memory usage from the TripAndShapeDistanceValidator. This should be further investigated.