Closed mathieucarbou closed 7 years ago
Are you running on the latest release? I have a test that does pretty well the same thing (in order to test node interaction and best-efforts data from the passive) and it works fine. The service is looked up the same way, too.
@jd0-sag : Ehcache is using 1.0.13.beta.
So, this problem is happening when the entity is created during replication and synchronization (after restarting the active) or just in one path?
Given that we have a test doing the same lookup, without issue, you will probably need to debug into passthrough to see if the ServiceProvider
is present or why else it fails to be resolved. If anything, I would suspect that there might be something like a class loader mismatch but that would surprise me since we would probably see that in our tests.
I think your right with your 1st hypothesis. If you look at the test, when the failure happened, the test thread was stopped at:
clusterControl.terminateActive();
clusterControl.waitForActive();
You also have some test like that ? killing active and wait after ?
I didn't have that in my test but, after adding it, everything still works.
Additionally, that shouldn't involve creating a passive entity (terminating the active will cause the previous passive to be promoted to active) and the waiting for active doesn't do anything.
Further, I don't see the thread being stopped there, in your screenshot, since you are looking at the server thread.
I don't know what to say ;-)
This is an ehcache test that i am not familiar with, I just saw that in this case, IMonotoringProducer was null.
So except by telling that what I observed is wrong, I don't know what I could tell you except try checkout ehcache code and run the test and you will see ;-) I even do not know if the test is right or wrong. I just know that when this test is ran, I do not get a IMonotoringProducer
. And this is why I filed an issue. We need a IMonotoringProducer
to make M&M works ;-)
You just have to add these lines in the creation of PassiveEhcacheEntity:
IMonitoringProducer monitoringProducer = services.getService(new BasicServiceConfiguration<>(IMonitoringProducer.class));
Objects.requireNonNull(monitoringProducer)
And you will see the test failing.
Since you already have the test and can verify it, can you check to see what is in the ServiceProvider
collection being consulted?
For this test, they can successfully request the IMonitoringProducer
in the active case?
I don't know because I only request the 'IMonitoringProducer' from a passive entity. We do not need it from an active entity.
I'll ping you tomorrow so that we can look together in a hangout to be more efficient :-)
Take a look to verify if this is a classloader issue (you should be able to see the instance there, resolving to a different instance of the same class). We suspect that, if you are in a different classloader when IMonitoringProducer
is loaded, you are getting a mismatch. Passthrough doesn't have explicit support for CommonComponent
.
Here is what I did. I added:
IMonitoringProducer monitoringProducer = services.getService(new BasicServiceConfiguration<>(IMonitoringProducer.class));
System.out.println(getClass().getSimpleName() + " monitoringProducer=" + monitoringProducer + " classloader=" + (monitoringProducer == null ? null : monitoringProducer.getClass().getClassLoader()));
Objects.requireNonNull(monitoringProducer);
In EhcachePassiveEntity
and EhcacheActiveEntity
, as first lines in the ctor.
Than I run ClusteredStateRepositoryReplicationTest
.
Here is the output:
EhcachePassiveEntity monitoringProducer=null classloader=null
EhcacheActiveEntity monitoringProducer=null classloader=null
Here is how passthrough is initialized:
this.clusterControl = PassthroughTestHelpers.createActivePassive(STRIPENAME,
new PassthroughTestHelpers.ServerInitializer() {
@Override
public void registerServicesForServer(PassthroughServer server) {
server.registerServerEntityService(new EhcacheServerEntityService());
server.registerClientEntityService(new EhcacheClientEntityService());
server.registerServerEntityService(new VoltronReadWriteLockServerEntityService());
server.registerClientEntityService(new VoltronReadWriteLockEntityClientService());
server.registerExtendedConfiguration(new OffHeapResourcesProvider(getOffheapResourcesType("test", 32, MemoryUnit.MB)));
UnitTestConnectionService.addServerToStripe(STRIPENAME, server);
}
}
);
clusterControl.waitForActive();
clusterControl.waitForRunningPassivesInStandby();
So... IMonitoringProducer
is simply not there at all!
I tried to do the same thing in our test in tc-platform, and I am unable to reproduce. BUT: we do not have passives and passive entities yet! So perhaps because of that. Or... perhaps because Ehcache is using its own ConnectionService
: UnitTestConnectionService
Please have a look at ClusteredStateRepositoryReplicationTest
in Ehcache. This test can be run directly from within the IDE and is fast.
So for the moment, we can leave this open. Once I'll add some HA testing, I'll check if I have the same issue.
If yes, it means there's an issue in passthrough with passives and it will be a blocker for us.
If not, then it means these failures are only in Ehcache, and it could be the way Ehcache does their testing with passthrough. Perhaps it has an impact on the IMonitoringProducer
. This will also be a blocker for us.
IMonitoringProducer
will never appear as a registered service provider since it is provided by the implementation. In the case of passthrough, this is PassthroughMonitoringProducer
.
The test that you did doesn't provide any information since the problem is with the classloader of IMonitoringProducer
being different between the your requesting code and the code which actually implements it. Since the services.getService
is essentially looking something up by class instance match, this will fail if they are in different class loaders.
To determine if this is the case, you will need to take that example you had in the debugger and see why it failed to find the service. Is there anything implementing IMonitoringProducer
in the service registry being consulted, and just has a different class instance, or is it missing, altogether.
I looked into this (debugged why we weren't returning something in that test) and the problem is that there is no IStripeMonitoring
implementation registered. Since this means that the data can't be sent anywhere, it doesn't return a service.
Passthrough treats this the same way, on both the active and the passive, and then allows the wrapper to switch modes if the server becomes active.
The real server uses the same logic.
In our internal tests, we register an IStripeMonitoring
service provider as a built-in but any mechanism to install this would work.
So, to summarize: Both passthrough and core will return null for IMonitoringProducer
, on both active and passive, is there is no IStripeMonitoring
implementation registered.
The rationale being that this is the only way to receive data from the service and returning non-null would imply, to the caller, that the data was going somewhere.
There's an issue with that @jd0-sag. On passive servers, an entity could just fetch a IMonitoringProducer
and use it, whether there is a monitoring service or not, because the IMonitoringProducer
is just a "network bridge" in this case, and the monitoring service does nothing on passives. It is not related at all to any IStripeMonitoring
. IStripeMonitoring
is just on actives ?
While the IStripeMonitoring
is only consulted on the active, the stripe is consistently configured so we wouldn't expect only some servers in the stripe to have it.
ok! so in this case, I'll change our implementation to support null IMonitoringProducer
in our configuration object, and I'll close this issue!
As a workaround, I'll check the null return. But we said that
IMonitoringProducer
was a platform-provided service and that it was always there. So I expect it to be there ;-)When doing passthrough testing, when creating passive entity, the IMonitoringProducer provided by platform does not exist and is null.
Ses Ehcache test:
ClusteredStateRepositoryReplicationTest.testClusteredStateRepositoryReplication
The IMonitoringProducer is null.
I'll put a workaround at the moment that will disable management if IMonitoringProducer is not found, but this need to be resolved ASAP because we will do some HA test with passive entities and we will need the
IMonitoringProducer
to be there in passthrough when creating passive entities.