conveyal / analysis-backend

Server component of Conveyal Analysis
http://conveyal.com/analysis
MIT License
23 stars 12 forks source link

Bizarre Trove map exception #172

Closed abyrd closed 5 years ago

abyrd commented 6 years ago

When starting a single point analysis for which no workers are running, I'm seeing:

08:16:10.059 [qtp349762933-1602] ERROR com.conveyal.taui.AnalysisServer - No free or removed slots available. Key set full?!!
java.lang.IllegalStateException: No free or removed slots available. Key set full?!!
    at gnu.trove.impl.hash.TObjectHash.insertKeyRehash(TObjectHash.java:358)
    at gnu.trove.impl.hash.TObjectHash.insertKey(TObjectHash.java:294)
    at gnu.trove.map.hash.TObjectLongHashMap.put(TObjectLongHashMap.java:239)
    at com.conveyal.taui.analysis.broker.Broker.createWorkersInCategory(Broker.java:279)
    at com.conveyal.taui.controllers.WorkerController.singlePoint(WorkerController.java:136)
    at spark.RouteImpl$1.handle(RouteImpl.java:72)
    at spark.http.matching.Routes.execute(Routes.java:61)
    at spark.http.matching.MatcherFilter.doFilter(MatcherFilter.java:130)
    at spark.embeddedserver.jetty.JettyHandler.doHandle(JettyHandler.java:50)
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1568)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
    at org.eclipse.jetty.server.Server.handle(Server.java:564)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:317)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:279)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:110)
    at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:124)
    at org.eclipse.jetty.util.thread.Invocable.invokePreferred(Invocable.java:128)
    at org.eclipse.jetty.util.thread.Invocable$InvocableExecutor.invoke(Invocable.java:222)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:294)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:199)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:673)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:591)
    at java.lang.Thread.run(Thread.java:748)

Other users are also seeing this around the same time.

The line that appears to be failing is: recentlyRequestedWorkers.put(category, System.currentTimeMillis());

How can the key set of and Object-int map be full? If this is a problem with the width of the index variable, how could we have anywhere near the 2 giga-entries that a signed int would allow?

I'm going to guess this is a bug in Trove and advocate upgrading to a newer version.

abyrd commented 6 years ago

As expected restarting the backend cleared out the malfunctioning map, but why did this ever happen in the first place? We are currently using what appears to be the latest version of trove4j 3.4.3, which hasn't been updated since 2013. Maybe we're somehow generating zillions of different network IDs and inserting them in a loop but that seems unlikely.

abyrd commented 5 years ago

We think this has been solved by PR #201. This non-threadsafe map was being updated by multiple threads. The error message was probably meaningless - interleaved reads and writes from multiple threads were probably just trashing the whole data structure in some unpredictable way. We're now using synchronized data structures.