byzhang / terrastore

Automatically exported from code.google.com/p/terrastore
Other
0 stars 0 forks source link

Abrupt shutdown of Terrastore instance 'A' causes instance 'B' to throw several exceptions #186

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Today, I abruptly shutdown an instance of Terrastore that was part of a 
two-instance cluster:

* Terrastore A
* Terrastore B
* Terracotta A (active)
* Terracotta B (standby)

Normally, I terminate instances with control-c, which leads to predictable 
behavior.

This termination was a result of a system reboot.  As part of the reboot, both 
Terrastore A and Terracotta A were shutdown abruptly.  So the remaining 
Terrastore instance ("Terrastore B") was left with the following scenario:

* The active Terracotta instance ("Terracotta A") just went offline.
* The peer Terrastore instance ("Terrastore A") just went offline.
* Standby Terracotta instance ("Terracotta B") became active a few seconds 
later.

During the reboot, I made a few requests to Terrastore B and nothing would come 
back.  I think it was either not quickly connecting to the newly-Active 
Terracotta B -or- there was some problem caused by the sudden disappearance of 
its peer.

I'm attaching the exceptions that appeared on the console during this episode.  
The "null" messages (lines ~31, ~63) were the most worrisome.

Original issue reported on code.google.com by teonanac...@gmail.com on 26 Nov 2011 at 1:20

Attachments:

GoogleCodeExporter commented 9 years ago
There probably were some problems with the standby master not correctly taking 
over the dead one: the alive server, so, failed to notice the membership change 
and tried to communicate with the dead one.

In order to debug it, I'd need the master and server terracotta logs, yo can 
find them in the tc-data directory.

Original comment by sergio.b...@gmail.com on 28 Nov 2011 at 10:08