Two actors with same refresh rate cause deadlock

delving / culture-hub

The Delving Search and Administrative Interface

Apache License 2.0

66 stars 5 forks source link

Two actors with same refresh rate cause deadlock #917

Closed manuelbernhardt closed 11 years ago

manuelbernhardt commented 11 years ago

There are two actors that periodically (every 5 minutes) refresh the state of the hub's system: one fetches new schemas (on schemas.delving.org), the other one refreshes the system configuration. Now, it appears that refreshing the schemas makes use of the configuration. In some rare cases, when both actors sync up exactly, a deadlock occurs as a result of this process as they call each-other.

The straightforward solution is to decouple both refresh rates, they're now identical and both actors start at the same time.

geralddejong commented 11 years ago

Would it not make sense to have an actor to trigger them both, one after the other is finished?

manuelbernhardt commented 11 years ago

The fact that actors are used here has less to do with them being actors in the traditional sense (with a hierarchy) than with them offering a concurrency model. I could've used concurrent data structures as well. The crux is really that the scheduling needs to happen correctly. In an ideal world such repeating services would register against a central scheduler making sure refreshes don't collide. But I don't want to introduce this when there's only 2 things that refresh.

On Tue, Jun 18, 2013 at 9:05 AM, Gerald de Jong notifications@github.comwrote:

Would it not make sense to have an actor to trigger them both, one after the other is finished?

— Reply to this email directly or view it on GitHubhttps://github.com/delving/culture-hub/issues/917#issuecomment-19594504 .

geralddejong commented 11 years ago

Would your rule of thumb be to introduce this when there are three? It seems that you are suggesting that this is a significant step higher in complexity.

manuelbernhardt commented 11 years ago

It is. Just check the code and how actors get scheduled. You're welcome to introduce a scheduler yourself.

On Tue, Jun 18, 2013 at 10:32 AM, Gerald de Jong notifications@github.comwrote:

Would your rule of thumb be to introduce this when there are three? It seems that you are suggesting that this is a significant step higher in complexity.

— Reply to this email directly or view it on GitHubhttps://github.com/delving/culture-hub/issues/917#issuecomment-19597721 .

geralddejong commented 11 years ago

I will. Thanks. Just trying to start with your take on the actors thing and go from there.

manuelbernhardt commented 11 years ago

After further inspection of the code, this can't be the root cause of the timeout trace, because the configuration is only ever used when a single schema is retrieved, not when the whole schema repository is discarded in favor of a new one.

So the deadlock must have another underlying cause, deep down inside of the configuration initialization mechanism. Given the little information that the logs yield, I am now suspect that the initialization of a certain kind of OrganizationConfigurationResourceHolder is hanging and hence causing everything to hang. I'll make another resource for this.