CUTR-at-USF / transit-feed-quality-calculator

A tool that uses the gtfs-realtime-validator to calculate the quality of a large number of GTFS-realtime feeds
Other
7 stars 1 forks source link

Analyze process currently hangs on Netherlands huge GTFS feed #1

Closed barbeau closed 6 years ago

barbeau commented 6 years ago

Currently, for the 194-The Netherlands feed/folder, the analyzer will get hung up when trying to validate it. Currently this GTFS file is approximately 261MB, which always results in an out of memory error in the gtfs-realtime-validator (see https://github.com/CUTR-at-USF/gtfs-realtime-validator/issues/123).

We need some method to skip any problematic feeds due to memory constraints and continue with analysis. To my knowledge the Netherlands feed is the only real-world GTFS file that the GTFS-realtime validator currently can't handle.

barbeau commented 6 years ago

Here's the final error when the process terminates:

[main] INFO edu.usf.cutr.gtfsrtvalidator.batch.BatchProcessor - gtfs.zip read in 216.409 seconds
[main] INFO edu.usf.cutr.gtfsrtvalidator.background.GtfsMetadata - Building GtfsMetadata for E:\Git Projects\transit-feed-quality-calculator\feeds\194-The Netherlands\gtfs.zip...
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at com.vividsolutions.jts.geom.impl.CoordinateArraySequence.<init>(CoordinateArraySequence.java:113)
    at com.vividsolutions.jts.geom.impl.CoordinateArraySequenceFactory.create(CoordinateArraySequenceFactory.java:91)
    at com.vividsolutions.jts.geom.GeometryFactory.createMultiPoint(GeometryFactory.java:382)
    at com.vividsolutions.jts.geom.GeometryFactory.createMultiPoint(GeometryFactory.java:363)
    at org.locationtech.spatial4j.shape.jts.JtsShapeFactory$JtsMultiPointBuilder.build(JtsShapeFactory.java:351)
    at edu.usf.cutr.gtfsrtvalidator.background.GtfsMetadata.<init>(GtfsMetadata.java:135)
    at edu.usf.cutr.gtfsrtvalidator.batch.BatchProcessor.processFeeds(BatchProcessor.java:133)
    at edu.usf.cutr.transitfeedqualitycalculator.BulkFeedValidator.validateFeeds(BulkFeedValidator.java:60)
    at edu.usf.cutr.transitfeedqualitycalculator.TransitFeedQualityCalculator.calculate(TransitFeedQualityCalculator.java:74)
    at edu.usf.cutr.transitfeedqualitycalculator.Main.main(Main.java:32)

Interestingly, it does get through reading the GTFS data, but hangs when building the metadata. Here's where it hangs in the GTFS-rt validator code when building GTFS metadata:

      if (shapePoints != null && shapePoints.size() > 3) {
            for (ShapePoint p : shapePoints) {
                String shapeId = p.getShapeId().getId();
                // If there isn't already a list for this shape_id, create one
                List<ShapePoint> shapePointList = mShapePoints.computeIfAbsent(shapeId, k -> new ArrayList<>());
                shapePointList.add(p);
                // Create GTFS shapes.txt bounding box
                shapeBuilder.pointXY(p.getLon(), p.getLat());
            }
            _log.debug("Loaded shapes.txt points for " + feedUrl);

            Shape shapePointShape = shapeBuilder.build();  <--- This causes OutOfMemoryError

So it terminates when trying to build the geometry for the area bounding box in the JTS spatial operations library.

barbeau commented 6 years ago

One potential option to fix this is to simplify polylines in the GTFS-rt validator before turning them into a JTS Shape. This would result in fewer points.

Actually, the above is incorrect - this is for creating the agency bounding box, so right now it dumps all shape points into a big bin to produce the bounding box. The potential fix here would be to not use the shape bounding box for an extremely large number of shape points, and use the stops for the bounding box instead (this same logic is used if the agency doesn't have a shapes.txt in their GTFS).

barbeau commented 6 years ago

Issue to add an option to avoid shapes.txt processing to get around this issue is open on gtfs-realtime-validator at https://github.com/CUTR-at-USF/gtfs-realtime-validator/issues/284.

When the above issue is resolved, I'll add this option to this tool and close out this issue.

barbeau commented 6 years ago

Alright, https://github.com/CUTR-at-USF/gtfs-realtime-validator/issues/284 has been fixed, so now we can turn off shapes.txt processing for a feed using the following:

BatchProcessor.Builder builder = new BatchProcessor.Builder(gtfs, gtfsRealtime)
        .setIgnoreShapes(true);  // < --- This prevents processing of GTFS shapes.txt
BatchProcessor processor = builder.build();
processor.processFeeds()
barbeau commented 6 years ago

@Suryakandukoori Just a heads up - the master branch now contains a workaround to avoid running the shapes.txt metadata processing in the validator for the Netherlands feed (see https://github.com/CUTR-at-USF/transit-feed-quality-calculator/commit/21a6482dbab6f85e893704f98f2c54d20494e94d), so if you rebase on master you shouldn't need to worry about that feed causing the entire project to crash.