eclipse-ee4j / glassfish-shoal

Shoal
Other
5 stars 9 forks source link

When group leader failed, any member couldn't receive FailureRecovery notification #83

Closed glassfishrobot closed 14 years ago

glassfishrobot commented 15 years ago

When group leader failed, any member couldn't receive FailureRecovery notification. Of course, members added FailureRecoveryActionFactoryImpl and their callbacks to GMS. But if failure member was not group leader, other member received FailureRecovery notification successfully.

Here are two logs.

case 1) When failure member is group leader.

2008. 11. 12 오후 9:43:28 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens 정보: GMS View Change Received for group DemoGroup : Members in view for (before change analysis) are : 1: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03

2008. 11. 12 오후 9:43:28 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved 정보: Analyzing new membership snapshot received as part of event : MASTER_CHANGE_EVENT 2008. 11. 12 오후 9:43:28 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens 정보: GMS View Change Received for group DemoGroup : Members in view for (before change analysis) are : 1: MemberId: b6663a51-9b79-43e2-92dd-41899c907383, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250331DA08A66D0554F138E75E74AA363FC9E03 2: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03

2008. 11. 12 오후 9:43:28 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved 정보: Analyzing new membership snapshot received as part of event : MASTER_CHANGE_EVENT 2008. 11. 12 오후 9:43:28 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens 정보: GMS View Change Received for group DemoGroup : Members in view for (before change analysis) are : 1: MemberId: b6663a51-9b79-43e2-92dd-41899c907383, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250331DA08A66D0554F138E75E74AA363FC9E03 2: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03

2008. 11. 12 오후 9:43:28 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved 정보: Analyzing new membership snapshot received as part of event : ADD_EVENT 2008. 11. 12 오후 9:43:53 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens 정보: GMS View Change Received for group DemoGroup : Members in view for (before change analysis) are : 1: MemberId: b6663a51-9b79-43e2-92dd-41899c907383, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250331DA08A66D0554F138E75E74AA363FC9E03 2: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03

2008. 11. 12 오후 9:43:53 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved 정보: Analyzing new membership snapshot received as part of event : IN_DOUBT_EVENT 2008. 11. 12 오후 9:43:53 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow addInDoubtMemberSignals 정보: gms.failureSuspectedEventReceived 2008. 11. 12 오후 9:43:53 com.sun.enterprise.ee.cms.impl.common.Router notifyFailureSuspectedAction 정보: Sending FailureSuspectedSignals to registered Actions. Member:b6663a51- 9b79-43e2-92dd-41899c907383... 2008. 11. 12 오후 9:43:57 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens 정보: GMS View Change Received for group DemoGroup : Members in view for (before change analysis) are : 1: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03

2008. 11. 12 오후 9:43:57 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved 정보: Analyzing new membership snapshot received as part of event : MASTER_CHANGE_EVENT 2008. 11. 12 오후 9:43:57 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens 정보: GMS View Change Received for group DemoGroup : Members in view for (before change analysis) are : 1: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03

2008. 11. 12 오후 9:43:57 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved 정보: Analyzing new membership snapshot received as part of event : FAILURE_EVENT 2008. 11. 12 오후 9:43:57 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow addFailureSignals 정보: The following member has failed: b6663a51-9b79-43e2-92dd-41899c907383

case 2) When failure member is not group leader

2008. 11. 12 오후 9:40:03 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens 정보: GMS View Change Received for group DemoGroup : Members in view for (before change analysis) are : 1: MemberId: 96438e75-740c-4613-af8d-6b2ab8ea4727, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033376CC0C6DAB74C2BA6FAF9C6648D77BC03

2008. 11. 12 오후 9:40:03 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved 정보: Analyzing new membership snapshot received as part of event : MASTER_CHANGE_EVENT 2008. 11. 12 오후 9:40:14 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens 정보: GMS View Change Received for group DemoGroup : Members in view for (before change analysis) are : 1: MemberId: 96438e75-740c-4613-af8d-6b2ab8ea4727, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033376CC0C6DAB74C2BA6FAF9C6648D77BC03 2: MemberId: b77af0d3-581c-4392-89cf-6a06d736c90f, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033EBEBAC9321A742D0B319D3F89446E0B103

2008. 11. 12 오후 9:40:14 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved 정보: Analyzing new membership snapshot received as part of event : ADD_EVENT 2008. 11. 12 오후 9:40:43 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens 정보: GMS View Change Received for group DemoGroup : Members in view for (before change analysis) are : 1: MemberId: 96438e75-740c-4613-af8d-6b2ab8ea4727, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033376CC0C6DAB74C2BA6FAF9C6648D77BC03 2: MemberId: b77af0d3-581c-4392-89cf-6a06d736c90f, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033EBEBAC9321A742D0B319D3F89446E0B103

2008. 11. 12 오후 9:40:49 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved 정보: Analyzing new membership snapshot received as part of event : IN_DOUBT_EVENT 2008. 11. 12 오후 9:41:07 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow addInDoubtMemberSignals 정보: gms.failureSuspectedEventReceived 2008. 11. 12 오후 9:41:12 com.sun.enterprise.ee.cms.impl.common.Router notifyFailureSuspectedAction 정보: Sending FailureSuspectedSignals to registered Actions. Member:b77af0d3- 581c-4392-89cf-6a06d736c90f... 2008. 11. 12 오후 9:41:29 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens 정보: GMS View Change Received for group DemoGroup : Members in view for (before change analysis) are : 1: MemberId: 96438e75-740c-4613-af8d-6b2ab8ea4727, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033376CC0C6DAB74C2BA6FAF9C6648D77BC03

2008. 11. 12 오후 9:41:41 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved 정보: Analyzing new membership snapshot received as part of event : FAILURE_EVENT 2008. 11. 12 오후 9:41:42 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow addFailureSignals 정보: The following member has failed: b77af0d3-581c-4392-89cf-6a06d736c90f 2008. 11. 12 오후 9:42:19 com.sun.enterprise.ee.cms.impl.common.RecoveryTargetSelector setRecoverySelectionState 정보: Appointed Recovery Server:96438e75-740c-4613-af8d-6b2ab8ea4727:for failed member:b77af0d3-581c-4392-89cf-6a06d736c90f:for group:DemoGroup 2008. 11. 12 오후 9:42:19 com.sun.enterprise.ee.cms.impl.common.Router notifyFailureRecoveryAction 정보: Sending FailureRecoveryNotification to component service

In case1(abnormal case), group leader failed -> IN_DOUBT_EVENT -> MASTER_CHANGE_EVENT(because new master was selected) -> FAILURE_EVENT

In case2(normal case), member failed -> IN_DOUBT_EVENT -> FAILURE_EVENT

For receiving FailureRecovery notification, recovery target should be resolved. Selection algorithm for recovery target uses previous members' view.

Assume that "A" and "B" are member in the same group and "A" is group leader.

[case1: "B"'s view histroy] ... --> (A, B) --> A failed -> B became to be new master with master change event -> (B)[previous view] -> failure event -> (B)[current view]

[case2: "A"'s view history] ... --> (A, B)[previous view] --> B failed -> failure event -> (B)[current view]

In other words, case1's previous view doesn't have "A"(failure member), so default algorithm (SimpleSelectionAlgorithm) can't find proper recovery target. case2's previous view has "B"(failure member), so default algorithm can select "A" for recovery target. (I assume that you already know SimpleSelectionAlgorithm)

So I think that this issue has a concern in selection algorithm for recovery target.

I think that thinking out another simple algorithm can be an example for resolving this issue. ex) always selecting first core member in live cache.

Environment

Operating System: All Platform: Windows

Affected Versions

[current]

glassfishrobot commented 6 years ago
glassfishrobot commented 15 years ago

@glassfishrobot Commented Reported by carryel

glassfishrobot commented 15 years ago

@glassfishrobot Commented shreedhar_ganapathy said: ..

glassfishrobot commented 15 years ago

@glassfishrobot Commented @jfialli said: Shoal test scenario 14 verifies that the fix for this has been integrated.

glassfishrobot commented 7 years ago

@glassfishrobot Commented This issue was imported from java.net JIRA SHOAL-83

glassfishrobot commented 14 years ago

@glassfishrobot Commented Marked as fixed on Wednesday, June 23rd 2010, 4:11:06 am