Closed barbeau closed 6 years ago
Here's the final error when the process terminates:
[main] INFO edu.usf.cutr.gtfsrtvalidator.batch.BatchProcessor - gtfs.zip read in 216.409 seconds
[main] INFO edu.usf.cutr.gtfsrtvalidator.background.GtfsMetadata - Building GtfsMetadata for E:\Git Projects\transit-feed-quality-calculator\feeds\194-The Netherlands\gtfs.zip...
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at com.vividsolutions.jts.geom.impl.CoordinateArraySequence.<init>(CoordinateArraySequence.java:113)
at com.vividsolutions.jts.geom.impl.CoordinateArraySequenceFactory.create(CoordinateArraySequenceFactory.java:91)
at com.vividsolutions.jts.geom.GeometryFactory.createMultiPoint(GeometryFactory.java:382)
at com.vividsolutions.jts.geom.GeometryFactory.createMultiPoint(GeometryFactory.java:363)
at org.locationtech.spatial4j.shape.jts.JtsShapeFactory$JtsMultiPointBuilder.build(JtsShapeFactory.java:351)
at edu.usf.cutr.gtfsrtvalidator.background.GtfsMetadata.<init>(GtfsMetadata.java:135)
at edu.usf.cutr.gtfsrtvalidator.batch.BatchProcessor.processFeeds(BatchProcessor.java:133)
at edu.usf.cutr.transitfeedqualitycalculator.BulkFeedValidator.validateFeeds(BulkFeedValidator.java:60)
at edu.usf.cutr.transitfeedqualitycalculator.TransitFeedQualityCalculator.calculate(TransitFeedQualityCalculator.java:74)
at edu.usf.cutr.transitfeedqualitycalculator.Main.main(Main.java:32)
Interestingly, it does get through reading the GTFS data, but hangs when building the metadata. Here's where it hangs in the GTFS-rt validator code when building GTFS metadata:
if (shapePoints != null && shapePoints.size() > 3) {
for (ShapePoint p : shapePoints) {
String shapeId = p.getShapeId().getId();
// If there isn't already a list for this shape_id, create one
List<ShapePoint> shapePointList = mShapePoints.computeIfAbsent(shapeId, k -> new ArrayList<>());
shapePointList.add(p);
// Create GTFS shapes.txt bounding box
shapeBuilder.pointXY(p.getLon(), p.getLat());
}
_log.debug("Loaded shapes.txt points for " + feedUrl);
Shape shapePointShape = shapeBuilder.build(); <--- This causes OutOfMemoryError
So it terminates when trying to build the geometry for the area bounding box in the JTS spatial operations library.
One potential option to fix this is to simplify polylines in the GTFS-rt validator before turning them into a JTS Shape
. This would result in fewer points.
Actually, the above is incorrect - this is for creating the agency bounding box, so right now it dumps all shape points into a big bin to produce the bounding box. The potential fix here would be to not use the shape bounding box for an extremely large number of shape points, and use the stops for the bounding box instead (this same logic is used if the agency doesn't have a shapes.txt
in their GTFS).
Issue to add an option to avoid shapes.txt processing to get around this issue is open on gtfs-realtime-validator at https://github.com/CUTR-at-USF/gtfs-realtime-validator/issues/284.
When the above issue is resolved, I'll add this option to this tool and close out this issue.
Alright, https://github.com/CUTR-at-USF/gtfs-realtime-validator/issues/284 has been fixed, so now we can turn off shapes.txt processing for a feed using the following:
BatchProcessor.Builder builder = new BatchProcessor.Builder(gtfs, gtfsRealtime)
.setIgnoreShapes(true); // < --- This prevents processing of GTFS shapes.txt
BatchProcessor processor = builder.build();
processor.processFeeds()
@Suryakandukoori Just a heads up - the master
branch now contains a workaround to avoid running the shapes.txt metadata processing in the validator for the Netherlands feed (see https://github.com/CUTR-at-USF/transit-feed-quality-calculator/commit/21a6482dbab6f85e893704f98f2c54d20494e94d), so if you rebase on master
you shouldn't need to worry about that feed causing the entire project to crash.
Currently, for the
194-The Netherlands
feed/folder, the analyzer will get hung up when trying to validate it. Currently this GTFS file is approximately 261MB, which always results in an out of memory error in the gtfs-realtime-validator (see https://github.com/CUTR-at-USF/gtfs-realtime-validator/issues/123).We need some method to skip any problematic feeds due to memory constraints and continue with analysis. To my knowledge the Netherlands feed is the only real-world GTFS file that the GTFS-realtime validator currently can't handle.