Ensure CacheFactory.ensureCluster() calls are only made when starting a pattern

brianoliver commented 11 years ago

We need to make sure that #48 does not occur in other patterns. From what I can tell, it looks like the Processing Pattern is the only other place where this possibly happens.

To be able to redeploy a cluster member correctly you need to detach the member from the cluster programmatically calling CacheFactory.shutdown() when application is undeployed. Then the method CacheFactory.shutdown() will call the stop() methods of the distributed services that runs on the leaving member.

Because the stop() method is run by a service thread, no reentrant service calls should be invoked inside the stop method to avoid deadlocks.

THE PROBLEM The CommandExecutor.stop() have a CacheFactory.ensureCluster() that is a service call within a service call (thus, a reentrant call)
public void stop() {
  if (Logger.isEnabled(Logger.DEBUG)) Logger.log(Logger.DEBUG, "Stopping CommandExecutor for %s", contextIdentifier); 

  //stop immediately  setState(State.Stopped);

  //this CommandExecutor must not be available any further to other threads   CommandExecutorManager.removeCommandExecutor(this.getContextIdentifier());

  //unregister JMX mbean for the CommandExecutor  Registry registry = CacheFactory.ensureCluster().getManagement(); // THIS IS THE SERVICE CALL   if (registry != null) {
      if (Logger.isEnabled(Logger.DEBUG)) Logger.log(Logger.DEBUG, "Unregistering JMX management extensions for CommandExecutor %s", contextIdentifier);  
      registry.unregister(getMBeanName());
  }

  if (Logger.isEnabled(Logger.DEBUG)) Logger.log(Logger.DEBUG, "Stopped CommandExecutor for %s", contextIdentifier);  
}
If the distributed service use to support the command pattern is configured to have a single thread (as it is by default). This call will produce a deadlock with a thread dump like this:
Thread[DistributedCache:DistributedCacheForCommandPattern|SERVICE_STOPPING,5,Cluster]
  com.tangosol.net.CacheFactory.ensureCluster(CacheFactory.java:424)
  com.oracle.coherence.patterns.command.internal.CommandExecutor.stop(CommandExecutor.java:671)
        ...
DIAGNOSTIC (AND POTENTIAL SOLUTION) I've changed the code of the CommandExecutor.stop() method to use a non blocking service call to obtain the Cluster
Registry registry = CacheFactory.getCluster() != null ? CacheFactory.getCluster().getManagement() : null;
Because CacheFactory.getCluster() is not a blocking service call the deadlock is avoided.

brianoliver commented 8 years ago

This issue was imported from JIRA COHINC-49

brianoliver commented 11 years ago

Reported by guillermo.garcia-ochoa

coherence-community / coherence-incubator

Ensure CacheFactory.ensureCluster() calls are only made when starting a pattern #49