Terracotta-OSS / tc-passthrough-testing

Enables simpler functional testing of Terracotta entities via pass-through communication.
Apache License 2.0
1 stars 18 forks source link

Passthrough does not provide the IMoniotringProducer when a passive entity is created #58

Closed mathieucarbou closed 7 years ago

mathieucarbou commented 7 years ago

As a workaround, I'll check the null return. But we said that IMonitoringProducer was a platform-provided service and that it was always there. So I expect it to be there ;-)

When doing passthrough testing, when creating passive entity, the IMonitoringProducer provided by platform does not exist and is null.

Ses Ehcache test:

ClusteredStateRepositoryReplicationTest.testClusteredStateRepositoryReplication

The IMonitoringProducer is null.

screen shot 2016-12-03 at 13 04 56

I'll put a workaround at the moment that will disable management if IMonitoringProducer is not found, but this need to be resolved ASAP because we will do some HA test with passive entities and we will need the IMonitoringProducer to be there in passthrough when creating passive entities.

jd0-sag commented 7 years ago

Are you running on the latest release? I have a test that does pretty well the same thing (in order to test node interaction and best-efforts data from the passive) and it works fine. The service is looked up the same way, too.

mathieucarbou commented 7 years ago

@jd0-sag : Ehcache is using 1.0.13.beta.

Their test: https://github.com/ehcache/ehcache3/blob/5315a7cd8eaa4845268011506a565feafc76fad2/clustered/client/src/test/java/org/ehcache/clustered/client/internal/service/ClusteredStateRepositoryReplicationTest.java

jd0-sag commented 7 years ago

So, this problem is happening when the entity is created during replication and synchronization (after restarting the active) or just in one path?

Given that we have a test doing the same lookup, without issue, you will probably need to debug into passthrough to see if the ServiceProvider is present or why else it fails to be resolved. If anything, I would suspect that there might be something like a class loader mismatch but that would surprise me since we would probably see that in our tests.

mathieucarbou commented 7 years ago

I think your right with your 1st hypothesis. If you look at the test, when the failure happened, the test thread was stopped at:

    clusterControl.terminateActive();
    clusterControl.waitForActive();

You also have some test like that ? killing active and wait after ?

jd0-sag commented 7 years ago

I didn't have that in my test but, after adding it, everything still works.

Additionally, that shouldn't involve creating a passive entity (terminating the active will cause the previous passive to be promoted to active) and the waiting for active doesn't do anything.

Further, I don't see the thread being stopped there, in your screenshot, since you are looking at the server thread.

mathieucarbou commented 7 years ago

I don't know what to say ;-) This is an ehcache test that i am not familiar with, I just saw that in this case, IMonotoringProducer was null. So except by telling that what I observed is wrong, I don't know what I could tell you except try checkout ehcache code and run the test and you will see ;-) I even do not know if the test is right or wrong. I just know that when this test is ran, I do not get a IMonotoringProducer. And this is why I filed an issue. We need a IMonotoringProducer to make M&M works ;-)

You just have to add these lines in the creation of PassiveEhcacheEntity:

IMonitoringProducer monitoringProducer = services.getService(new BasicServiceConfiguration<>(IMonitoringProducer.class));
Objects.requireNonNull(monitoringProducer)

And you will see the test failing.

jd0-sag commented 7 years ago

Since you already have the test and can verify it, can you check to see what is in the ServiceProvider collection being consulted?

For this test, they can successfully request the IMonitoringProducer in the active case?

mathieucarbou commented 7 years ago

I don't know because I only request the 'IMonitoringProducer' from a passive entity. We do not need it from an active entity.

I'll ping you tomorrow so that we can look together in a hangout to be more efficient :-)

jd0-sag commented 7 years ago

Take a look to verify if this is a classloader issue (you should be able to see the instance there, resolving to a different instance of the same class). We suspect that, if you are in a different classloader when IMonitoringProducer is loaded, you are getting a mismatch. Passthrough doesn't have explicit support for CommonComponent.

mathieucarbou commented 7 years ago

Here is what I did. I added:

    IMonitoringProducer monitoringProducer = services.getService(new BasicServiceConfiguration<>(IMonitoringProducer.class));
    System.out.println(getClass().getSimpleName() + " monitoringProducer=" + monitoringProducer + " classloader=" + (monitoringProducer == null ? null : monitoringProducer.getClass().getClassLoader()));
    Objects.requireNonNull(monitoringProducer);

In EhcachePassiveEntity and EhcacheActiveEntity, as first lines in the ctor.

Than I run ClusteredStateRepositoryReplicationTest.

Here is the output:

EhcachePassiveEntity monitoringProducer=null classloader=null
EhcacheActiveEntity monitoringProducer=null classloader=null

Here is how passthrough is initialized:

    this.clusterControl = PassthroughTestHelpers.createActivePassive(STRIPENAME,
        new PassthroughTestHelpers.ServerInitializer() {
          @Override
          public void registerServicesForServer(PassthroughServer server) {
            server.registerServerEntityService(new EhcacheServerEntityService());
            server.registerClientEntityService(new EhcacheClientEntityService());
            server.registerServerEntityService(new VoltronReadWriteLockServerEntityService());
            server.registerClientEntityService(new VoltronReadWriteLockEntityClientService());
            server.registerExtendedConfiguration(new OffHeapResourcesProvider(getOffheapResourcesType("test", 32, MemoryUnit.MB)));

            UnitTestConnectionService.addServerToStripe(STRIPENAME, server);
          }
        }
    );

    clusterControl.waitForActive();
    clusterControl.waitForRunningPassivesInStandby();

So... IMonitoringProducer is simply not there at all!

I tried to do the same thing in our test in tc-platform, and I am unable to reproduce. BUT: we do not have passives and passive entities yet! So perhaps because of that. Or... perhaps because Ehcache is using its own ConnectionService: UnitTestConnectionService

Please have a look at ClusteredStateRepositoryReplicationTest in Ehcache. This test can be run directly from within the IDE and is fast.

So for the moment, we can leave this open. Once I'll add some HA testing, I'll check if I have the same issue. If yes, it means there's an issue in passthrough with passives and it will be a blocker for us. If not, then it means these failures are only in Ehcache, and it could be the way Ehcache does their testing with passthrough. Perhaps it has an impact on the IMonitoringProducer. This will also be a blocker for us.

jd0-sag commented 7 years ago

IMonitoringProducer will never appear as a registered service provider since it is provided by the implementation. In the case of passthrough, this is PassthroughMonitoringProducer.

The test that you did doesn't provide any information since the problem is with the classloader of IMonitoringProducer being different between the your requesting code and the code which actually implements it. Since the services.getService is essentially looking something up by class instance match, this will fail if they are in different class loaders.

To determine if this is the case, you will need to take that example you had in the debugger and see why it failed to find the service. Is there anything implementing IMonitoringProducer in the service registry being consulted, and just has a different class instance, or is it missing, altogether.

jd0-sag commented 7 years ago

I looked into this (debugged why we weren't returning something in that test) and the problem is that there is no IStripeMonitoring implementation registered. Since this means that the data can't be sent anywhere, it doesn't return a service.

Passthrough treats this the same way, on both the active and the passive, and then allows the wrapper to switch modes if the server becomes active.

The real server uses the same logic.

jd0-sag commented 7 years ago

In our internal tests, we register an IStripeMonitoring service provider as a built-in but any mechanism to install this would work.

So, to summarize: Both passthrough and core will return null for IMonitoringProducer, on both active and passive, is there is no IStripeMonitoring implementation registered.

The rationale being that this is the only way to receive data from the service and returning non-null would imply, to the caller, that the data was going somewhere.

mathieucarbou commented 7 years ago

There's an issue with that @jd0-sag. On passive servers, an entity could just fetch a IMonitoringProducer and use it, whether there is a monitoring service or not, because the IMonitoringProducer is just a "network bridge" in this case, and the monitoring service does nothing on passives. It is not related at all to any IStripeMonitoring. IStripeMonitoring is just on actives ?

jd0-sag commented 7 years ago

While the IStripeMonitoring is only consulted on the active, the stripe is consistently configured so we wouldn't expect only some servers in the stripe to have it.

mathieucarbou commented 7 years ago

ok! so in this case, I'll change our implementation to support null IMonitoringProducer in our configuration object, and I'll close this issue!