eclipse-ee4j / glassfish-shoal

Shoal
Other
5 stars 9 forks source link

notifying cluster view event is not thread safe #82

Open glassfishrobot opened 15 years ago

glassfishrobot commented 15 years ago

ClusterViewManager.notifyListeners() can be executed on multi-threads when many members join the same group concurrently.

Though there are no member's failures, you can see the following log.


2008. 11. 12 오후 5:44:00 com.sun.enterprise.shoal.jointest.SimpleJoinTest initializeGMS 정보: Initializing Shoal for member: 5d3280a2-a0c5-4ae2-8d41-d59b57400b8f group:TestGroup 2008. 11. 12 오후 5:44:00 com.sun.enterprise.shoal.jointest.SimpleJoinTest runSimpleSample 정보: Registering for group event notifications 2008. 11. 12 오후 5:44:00 com.sun.enterprise.shoal.jointest.SimpleJoinTest runSimpleSample 정보: Joining Group TestGroup 2008. 11. 12 오후 5:44:07 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens 정보: GMS View Change Received for group TestGroup (5d3280a2-a0c5-4ae2-8d41- d59b57400b8f) : Members in view for (before change analysis) are : 1: MemberId: 5d3280a2-a0c5-4ae2-8d41-d59b57400b8f, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033090183254F6D47E7B235BC8D656194FA03

2008. 11. 12 오후 5:44:07 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved 정보: Analyzing new membership snapshot received as part of event : MASTER_CHANGE_EVENT 2008. 11. 12 오후 5:44:08 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens 정보: GMS View Change Received for group TestGroup (aeea918f-571b-463b-bfa6- 55c536df0d11) : Members in view for (before change analysis) are : (a) 1: MemberId: 5d3280a2-a0c5-4ae2-8d41-d59b57400b8f, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033090183254F6D47E7B235BC8D656194FA03 2: MemberId: addb1dbe-06cf-43b8-8903-78605f29091f, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250336C047E2077544A5692C1EA21407A886303 3: MemberId: aeea918f-571b-463b-bfa6-55c536df0d11, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033DBAE9788614944F8A40ED352C8E7A03B03 4: MemberId: fae1414d-702a-42fd-8c7d-6ffabe8b2e69, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033EF69FCF215DE43038FD0C3AA0535A08203

2008. 11. 12 오후 5:44:08 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved 정보: Analyzing new membership snapshot received as part of event : ADD_EVENT 2008. 11. 12 오후 5:44:17 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens 정보: GMS View Change Received for group TestGroup (addb1dbe-06cf-43b8-8903- 78605f29091f) : Members in view for (before change analysis) are : (b) 1: MemberId: 5d3280a2-a0c5-4ae2-8d41-d59b57400b8f, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033090183254F6D47E7B235BC8D656194FA03 2: MemberId: addb1dbe-06cf-43b8-8903-78605f29091f, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250336C047E2077544A5692C1EA21407A886303

2008. 11. 12 오후 5:44:17 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved 정보: Analyzing new membership snapshot received as part of event : ADD_EVENT 2008. 11. 12 오후 5:44:17 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens 정보: GMS View Change Received for group TestGroup (fae1414d-702a-42fd-8c7d- 6ffabe8b2e69) : Members in view for (before change analysis) are : (c) 1: MemberId: 5d3280a2-a0c5-4ae2-8d41-d59b57400b8f, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033090183254F6D47E7B235BC8D656194FA03 2: MemberId: addb1dbe-06cf-43b8-8903-78605f29091f, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250336C047E2077544A5692C1EA21407A886303 3: MemberId: fae1414d-702a-42fd-8c7d-6ffabe8b2e69, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033EF69FCF215DE43038FD0C3AA0535A08203

2008. 11. 12 오후 5:44:17 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved 정보: Analyzing new membership snapshot received as part of event : ADD_EVENT 2008. 11. 12 오후 5:44:20 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens 정보: GMS View Change Received for group TestGroup (42b22147-7683-481f-a9f4- 85ba5a2b847f) : Members in view for (before change analysis) are : 1: MemberId: 5d3280a2-a0c5-4ae2-8d41-d59b57400b8f, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033090183254F6D47E7B235BC8D656194FA03 2: MemberId: 42b22147-7683-481f-a9f4-85ba5a2b847f, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250334501FF701A644877A4B4C65068965F3403 3: MemberId: addb1dbe-06cf-43b8-8903-78605f29091f, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250336C047E2077544A5692C1EA21407A886303 4: MemberId: aeea918f-571b-463b-bfa6-55c536df0d11, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033DBAE9788614944F8A40ED352C8E7A03B03 5: MemberId: fae1414d-702a-42fd-8c7d-6ffabe8b2e69, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033EF69FCF215DE43038FD0C3AA0535A08203

...

This log means that five members join "TestGroup"

1: MemberId: 5d3280a2-a0c5-4ae2-8d41-d59b57400b8f, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033090183254F6D47E7B235BC8D656194FA03 2: MemberId: 42b22147-7683-481f-a9f4-85ba5a2b847f, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250334501FF701A644877A4B4C65068965F3403 3: MemberId: addb1dbe-06cf-43b8-8903-78605f29091f, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250336C047E2077544A5692C1EA21407A886303 4: MemberId: aeea918f-571b-463b-bfa6-55c536df0d11, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033DBAE9788614944F8A40ED352C8E7A03B03 5: MemberId: fae1414d-702a-42fd-8c7d-6ffabe8b2e69, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033EF69FCF215DE43038FD0C3AA0535A08203

And this log is printed in ViewWindow based on the viewQueue when new view is observed.

But above log message, you can see that (a), (b) and (c)'s order are strange.

Because there are no failures, I think that member's number should be increased gradually(or (a)'num <= (b)'s num <= (c)'s num).

The following code is ClusterViewManager's notifyListeners() method.


void notifyListeners(final ClusterViewEvent event) { Log.log(...); for (ClusterViewEventListener elem : cvListeners)

{ elem.clusterViewEvent(event, getLocalView()); }

}


getLocalView() is thread safe with viewLock but ClusterViewEventListener's clusterViewEvent() is not thread safe.

The following code is GroupCommunicationProviderImpl's clusterViewEvent() method which implements ClusterViewEventListener interface.


public void clusterViewEvent(final ClusterViewEvent clusterViewEvent, final ClusterView clusterView) { ... final EventPacket ePacket = new EventPakcet(clusterViewEvent.getEvent(), clusterViewEvent.getAdvertisement(), clusterView); final ArrayBlockingQueue viewQueue = getGMSContext ().getViewQueue(); try

{ viewQueue.put(ePacket); } catch(InterruptedExcetion e) { ... }

}

I think that local view's snapshot(getLocalView()'s return value) and viewQueue.put() should be atomic like this.

void notifyListeners(final ClusterViewEvent event) { Log.log(...); for (ClusterViewEventListener elem : cvListeners) { synchronized( elem ) { elem.clusterViewEvent(event, getLocalView()); } } }

or

public synchronized void clusterViewEvent(final ClusterViewEvent clusterViewEvent, final ClusterView clusterView) { ... final EventPacket ePacket = new EventPakcet(clusterViewEvent.getEvent(), clusterViewEvent.getAdvertisement(), clusterView); final ArrayBlockingQueue viewQueue = getGMSContext ().getViewQueue(); try { viewQueue.put(ePacket); }

catch(InterruptedExcetion e)

{ ... }

}

(In my opinion, I think that the former is better because clusterViewEvent() can be implemented variously)


In other words,

getLocalView() --> local view's snapshot --> (hole) --> insert view queue

As you can see above, before EventPacket is inserted into view queue, there is some hole. So we can remove the hole with synchronized block or individual lock object. If the hole is removed, I think that ViewWindow can receive local view capture from queue correctly.

Environment

Operating System: All Platform: Windows

Affected Versions

[current]

glassfishrobot commented 6 years ago
glassfishrobot commented 15 years ago

@glassfishrobot Commented Reported by carryel

glassfishrobot commented 15 years ago

@glassfishrobot Commented shreedhar_ganapathy said: ..

glassfishrobot commented 7 years ago

@glassfishrobot Commented This issue was imported from java.net JIRA SHOAL-82