Terracotta-OSS / terracotta-platform

http://terracotta.org
Apache License 2.0
32 stars 48 forks source link

EntityFactory.getOrCreatEntity might get stuck if all servers are down #309

Open henri-tremblay opened 7 years ago

henri-tremblay commented 7 years ago

These are tests made following the suggestion of using HealthChecker to monitor the connection. It partially works but the following problems were identified. They are particularly important for integration tests and batch applications.

Also, it seems impossible to have a sane application without HealthChecker. It should be documented as a mandatory system or be launched under the hood. And it should be an entity installed by default.

Here are the flaws encountered:

  1. There is no way to know a connection was already closed by the client itself. Calling close twice will raise an exception. We should have a isClosed method or at least close should be reentrant
  2. HealthChecker should import SLF4J. It needs it to run
  3. If the server is dead when retrieving the HealthChecker entity, it get stuck. This can be really frequent during testing
  4. HealthChecker entity is not by default in the kit. So it needs some Maven magic to work during testing
  5. There are now way to stop the HealthChecker

The test code (the map can be replaced by any other entity):

public class InterruptibleGetEntity {

  private Connection connection;

  @Rule
  public Cluster voltron = new BasicExternalCluster(new File("target/galvan"), 1,
    Arrays.asList(
      new File("../maps/server/target/maps-server-plugin.jar"),
      new File("target/entities/healthcheck-server.jar")),
    "", "", "");

  @Before
  public void setUp() throws Exception {
    voltron.getClusterControl().waitForActive();

    Properties properties = new Properties();
    properties.setProperty(ConnectionPropertyNames.CONNECTION_NAME, getClass().getSimpleName());
    properties.setProperty(ConnectionPropertyNames.CONNECTION_TIMEOUT, "10000");

    // Get a valid connection
    connection = ConnectionFactory.connect(voltron.getConnectionURI(), properties);
  }

  @After
  public void tearDown() throws Exception {
    try {
      connection.close();
    } catch(IllegalStateException e) {
      // Problem #1: Here, I have no way to know that the connection was already closed in the test
      // Since the connection was opened in setUp(), it makes sense to close it in tearDown(). And I
      // don't know if the test have closed it or not
      if(!e.getMessage().equals("Already shut down")) {
        throw e;
      }
    }
    voltron.getClusterControl().terminateAllServers();
  }

  @Test
  public void canRetrieveEntity() throws Exception {

    // Get a valid entity factory
    EntityFactory<ClusteredMapEntity, MapConfig> entityFactory = new EntityFactory<>(connection, ClusteredMapEntity.class);

    // Try to retrieve the entity
    ClusteredMapEntity entity = entityFactory.getOrCreateEntity("map", new MapConfig("map"));
    assertThat(entity).isNotNull();
  }

  @Test
  public void canCloseMultipleTimes() throws Exception {
    connection.close();
  }

  @Test
  public void dontGetStuckWhenAskingAnEntityWhenServerDown() throws Exception {
    // Get a valid entity factory
    EntityFactory<ClusteredMapEntity, MapConfig> entityFactory = new EntityFactory<>(connection, ClusteredMapEntity.class);

    // Close the connection
    connection.close();

    // Try to retrieve the entity
    entityFactory.getOrCreateEntity("map", new MapConfig("map"));
  }

  @Test
  public void gettingAnEntityWhenConnectionClosedShouldFail() throws Exception {
    // Get a valid entity factory
    EntityFactory<ClusteredMapEntity, MapConfig> entityFactory = new EntityFactory<>(connection, ClusteredMapEntity.class);

    // Close the connection
    connection.close();

    // Try to retrieve the entity. Should throw an exception
    assertThatExceptionOfType(IllegalStateException.class)
      .isThrownBy(() -> { entityFactory.getOrCreateEntity("map", new MapConfig("map")); })
      .withMessage("Already shut down");
  }

  @Test
  @Ignore
  public void healthCheckerDontGetStuckWhenRetrievingItsEntity() throws Exception {
    // Kill the server
    voltron.getClusterControl().terminateActive();

    // Problem #2: NoClassDefFoundError is SLF4J is not in the classpath. I guess it should be a dependency of healthchecker-api
    // Problem #3: This test fails because it gets stuck on the health entity retrieval
    HealthCheckerFactory.startHealthChecker(connection, 60, 2000);
  }

  @Test
  public void dontGetStuckWhenThereIsNoServer() throws Exception {
    // Get a valid entity factory
    EntityFactory<ClusteredMapEntity, MapConfig> entityFactory = new EntityFactory<>(connection, ClusteredMapEntity.class);

    // Problem #4: The entity is not by default in the kit so it needs to be added manually. I'm using the maven-dependency-plugin for that
    // Problem #5: No way to stop it
    HealthCheckerFactory.startHealthChecker(connection, 60, 2000);

    // Kill the server
    voltron.getClusterControl().terminateActive();

    // Try to retrieve the entity. It should stop and fail. And it does perfectly
    assertThatExceptionOfType(ConnectionClosedException.class)
      .isThrownBy(() -> { entityFactory.getOrCreateEntity("map", new MapConfig("map")); });
  }
}
henri-tremblay commented 7 years ago

No need to actually register a listener. The code was modified accordingly. And one of the problems (which was more a question) removed.

mathieucarbou commented 7 years ago

@henri-tremblay @myronkscott @ljacomet : is it the right place for this issue ?

henri-tremblay commented 7 years ago

What do you mean? It shouldn't be platform?

myronkscott commented 7 years ago

@mathieucarbou Yes. It is the HealthCheckEntity