Closed mchecca closed 9 years ago
We were able to run your program and restart the only node in the cluster. The client recovered within 4 seconds after the node was restarted.
The client could recover even earlier if "Policy.sleepBetweenRetries" was set to zero (instead of the default 500ms).
That's really odd. I just updated to the lastest version of master and re-ran the test. I've found that sometimes it does recover, but sometimes it doesn't. When it hangs, I'm noticing that it gets stuck in the "PutEventArgs" function in AsyncCluster. Even before restarting the Aerospike server, I'm getting quite a few timeouts with the exception "An asynchronous socket operation is already in progress using this SocketAsyncEventArgs instance."
PutEventArgs() adds the SocketAsyncEventArgs instance back to a bounded queue. The bounded queue is only initialized once at cluster creation. The only way PutEventArgs() could block is if the same SocketAsyncEventArgs was added back twice. This is consistent with the "An asynchronous socket operation is already in progress using this SocketAsyncEventArgs instance." message.
I recommend disabling retries see if that fixes the problem.
WritePolicy policy = new WritePolicy()
policy.maxRetries = 0;
policy.sleepBetweenRetries = 0;
AsyncClient ac = new AsyncClient("127.0.0.1", 3000);
for (int i = 0; true; i++)
{
try
{
Key key = new Key("test", "test", i);
ac.Put(policy, new AWriter(), key, new Bin[] { new Bin("value", "value-" + i) });
ac.Get(policy, new ARecordListener(), key);
System.Threading.Thread.Sleep(100);
}
catch (Exception ex)
{
Console.WriteLine("Exception: " + ex.Message);
}
}
Which environment are you running this program? Are you using 32-bit or 64-bit OS? Are you using mono?
Thanks for the suggestions, I'll try them out.
My environment is a 64-bit Windows client and a 64 bit Cent OS server running in VirtualBox (the Vagrant image).
But I've also got it to reproduce using Mono for the client and with a Ubuntu server instead of Cent OS, but neither seemed to have a significant impact. On Mar 26, 2015 8:28 PM, "Brian Nichols" notifications@github.com wrote:
Some questions on your environment.
Which environment are you running this program? Are you using 32-bit or 64-bit OS? Are you using mono?
— Reply to this email directly or view it on GitHub https://github.com/aerospike/aerospike-client-csharp/issues/5#issuecomment-86769526 .
Thanks. I'd concentrate on 64-bit Windows client and 64-bit Centos server since that is what I'm using.
I can't get Aerospike to reproduce the behavior reliably. It seems like every time I run it, I get something different. I'm closing this for now, I'll keep trying and see if I can better understand what is going on.
If you are using the AsyncClient and only using async calls, the client does not recover when the server goes down and comes back up. If you use a mix of async and sync calls and restart the server, then the client does recover.
I've attached a sample program (https://gist.github.com/mchecca/a9665f0f16796a8ef1e6) which reproduces the behavior as follows:
If you repeat the following steps but with the listener commented out in either the Put or Get method, the client does recover.