CUTR-at-USF / gtfs-realtime-validator

Java-based tool that validates General Transit Feed Specification (GTFS)-realtime feeds. See https://github.com/MobilityData/gtfs-realtime-validator for the latest!
Other
92 stars 40 forks source link

Change static GTFS validator #193

Open barbeau opened 7 years ago

barbeau commented 7 years ago

Summary:

For static GTFS validation, the Conveyal team has moved away from using gtfs-validator and focused on a newer project gtfs-lib.

We should look at integrating this into our project, in addition to, and perhaps eventually in place of, gtfs-validator.

My comments from https://github.com/conveyal/gtfs-validator/pull/40#issuecomment-300478926:

I'm definitely open to integrating gtfs-lib in addition to (or eventually in place of) gtfs-validator into our gtfs-rt-validator - I just opened CUTR-at-USF/gtfs-realtime-validator#193 for this. Given that gtfs-lib doesn't have a web UI for initial integration and our current focus, we'd probably just spit out the JSON file and build a UI for the output later. I believe Transitland is now showing gtfs-lib in their web UI - for example, see https://transit.land/dispatcher/feed-versions/eb0cbe5ab41c9cfde0ebae42471ab5b3f712b008.

barbeau commented 7 years ago

As mentioned in https://github.com/CUTR-at-USF/gtfs-realtime-validator/issues/123#issuecomment-293679370:

If I try to run the Dutch feed (http://gtfs.openov.nl/gtfs/gtfs-openov-nl.zip) with -Xmx8g parameter on my machine (dual Xeon @ 2.5 GHz w/ 16GB RAM), I get this exception after it runs for a very long time (I left and came back an hour later):

javax.servlet.ServletException: org.glassfish.jersey.server.ContainerException: >java.lang.OutOfMemoryError: GC overhead limit exceeded
  at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:423)
  at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:386)
  at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:334)
  at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:221)
  at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:800)
  at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:587)
  at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125)
  at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
  at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059)
  at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
  at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
  at org.eclipse.jetty.server.Server.handle(Server.java:497)
  at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:313)
  at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248)
  at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
  at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:626)
  at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:546)
  at java.lang.Thread.run(Thread.java:745)
Caused by: org.glassfish.jersey.server.ContainerException: java.lang.OutOfMemoryError: GC overhead limit exceeded
  at org.glassfish.jersey.servlet.internal.ResponseWriter.rethrow(ResponseWriter.java:256)
  at org.glassfish.jersey.servlet.internal.ResponseWriter.failure(ResponseWriter.java:238)
  at org.glassfish.jersey.server.ServerRuntime$Responder.process(ServerRuntime.java:486)
  at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:316)
  at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
  at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
  at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
  at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
  at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
  at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)
  at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:291)
  at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1140)
  at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:403)
  ... 17 more
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
  at java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:68)
  at java.lang.StringBuilder.<init>(StringBuilder.java:89)
  at org.onebusaway.csv_entities.DelimitedTextParser.parse(DelimitedTextParser.java:65)
  at org.onebusaway.csv_entities.CSVLibrary.parse(CSVLibrary.java:131)
  at org.onebusaway.csv_entities.CsvTokenizerStrategy.parse(CsvTokenizerStrategy.java:34)
  at org.onebusaway.csv_entities.CsvEntityReader.readEntities(CsvEntityReader.java:154)
  at org.onebusaway.csv_entities.CsvEntityReader.readEntities(CsvEntityReader.java:120)
  at org.onebusaway.csv_entities.CsvEntityReader.readEntities(CsvEntityReader.java:115)
  at org.onebusaway.gtfs.serialization.GtfsReader.run(GtfsReader.java:172)
  at org.onebusaway.gtfs.serialization.GtfsReader.run(GtfsReader.java:160)
  at com.conveyal.gtfs.validator.json.FeedProcessor.load(FeedProcessor.java:73)
  at com.conveyal.gtfs.validator.json.FeedProcessor.run(FeedProcessor.java:44)
  at edu.usf.cutr.gtfsrtvalidator.api.resource.GtfsFeed.postGtfsFeed(GtfsFeed.java:180)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:497)
  at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
  at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)
  at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161)
  at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:160)
  at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99)
  at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389)
  at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347)
  at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102)
  at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:308)
  at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
  at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
  at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
  at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
  at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
  at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)

So it looks like it's getting hung up in the static GTFS validation using the Conveyal gtfs-validator.

I tried running the same http://gtfs.openov.nl/gtfs/gtfs-openov-nl.zip feed through gtfs-lib on the command line (i.e., not integrated into gtfs-rt-validator - just running the built jar with java -jar target/gtfs-lib-2.2.0-SNAPSHOT-shaded.jar -validate gtfs-openov-nl.zip result.json) and it took a while but was able to process it without crashing. So that's one benefit of using/moving to gtfs-lib.

EDIT - Looks like I spoke a little too soon - it did get through the validation phase, but after about an hour still hasn't generated any JSON output. It's hung at:

[main] INFO com.conveyal.gtfs.GTFSFeed - TripTimesValidator finished in 154209 milliseconds.
[main] INFO com.conveyal.gtfs.GTFSFeed - UnusedStopValidator finished in 156 milliseconds.
[main] INFO com.conveyal.gtfs.GTFSFeed - 8 validators completed in 204791 milliseconds.
barbeau commented 4 years ago

We should switch to using the MobilityData gtfs-validator as the static validator in this project: https://github.com/MobilityData/gtfs-validator

barbeau commented 2 years ago

Note that the Maven configuration added in PR https://github.com/CUTR-at-USF/gtfs-realtime-validator/pull/403 should be removed when the transition to the MobilityData gtfs-validator is complete.