coherence-community / coherence-incubator

Coherence Incubator
38 stars 37 forks source link

Using PushReplication with tangosol.coherence.site with clusters spreading across sites may not create the right channels #148

Closed brianoliver closed 9 years ago

brianoliver commented 9 years ago

The architecture is Active-Active Push Replication 11.3.0 + Coherence 3.7.1.13 . In this environment there is a difference with the usual Push Rep architecture because the two clusters extend across two different sites (they are physically close),i.e. the network latency is very low and we can afford having Cluster A extending across site1 and site2 and Cluster B also extending across site1 and site2. In the test performed we are replicating 5 different caches between Cluster A and Cluster B.

The setup works fine when there is an event channel created for each cache / site pair so it ends up having event channels from Cluster A - site1 to Cluster B and event channels from Cluster A - site2 to Cluster B. Upon performing several restarts (stopping and starting cache server nodes on site1 and site2) we end up just having event channels from Cluster A - site1 to cluster B but we don't have any event channel in Cluster A - site2 to Cluster B and, thus, just data having its primary in the Cluster A - site1 node is replicated to Cluster B, as nobody is listening to events in Cluster A - site2 (as our statusHA is SITE-SAFE the primary data is always distributed across sites)

We have tested unsetting tangosol.coherence.site and setting tangosol.coherence.rack and this way it works fine, i.e. the channels are always created correctly so all the data is replicated and no data is lost.

When checking the channel name with JConsole looks like the trick is when tangosol.coherence.site is not set the channel name is defined as mycluster-mycache insted of mysite-mycluster-mycache .

brianoliver commented 9 years ago

@brianoliver said: The name of an event channel or distributor is not hard coded, but set through configuration.

For example: In the Push Replication Tests we use the following:

<event:distributor-external-name>{site-name}-{cluster-name}-{cache-name}</event:distributor-external-name>

Notice here that we're using the "site-name" system parameter. If this is not set it will be resolved to .

Consequently this issue is due to the "site-name" parameter being used in a configuration file, but not being set as a system parameter. If the system parameter is not desired, the "site-name" parameter can be removed.

brianoliver commented 9 years ago

@brianoliver said: This is due to a configuration mistake.

brianoliver commented 8 years ago

This issue was imported from JIRA COHINC-148

brianoliver commented 9 years ago

Reported by agirona

brianoliver commented 9 years ago

Marked as works as designed by @brianoliver on Tuesday, October 13th 2015, 11:16:54 am