aerospike / aerospike-client-csharp

Aerospike C# Client Library
70 stars 48 forks source link

Restarting the server causes AsyncClient to not recover #5

Closed mchecca closed 9 years ago

mchecca commented 9 years ago

If you are using the AsyncClient and only using async calls, the client does not recover when the server goes down and comes back up. If you use a mix of async and sync calls and restart the server, then the client does recover.

I've attached a sample program (https://gist.github.com/mchecca/a9665f0f16796a8ef1e6) which reproduces the behavior as follows:

  1. Ensure the Aerospike server is running
  2. Start the program
  3. Restart the Aerospike service
  4. Observer the client begin to (obviously) fail to read and write, but then it hangs

If you repeat the following steps but with the listener commented out in either the Put or Get method, the client does recover.

BrianNichols commented 9 years ago

We were able to run your program and restart the only node in the cluster. The client recovered within 4 seconds after the node was restarted.

The client could recover even earlier if "Policy.sleepBetweenRetries" was set to zero (instead of the default 500ms).

mchecca commented 9 years ago

That's really odd. I just updated to the lastest version of master and re-ran the test. I've found that sometimes it does recover, but sometimes it doesn't. When it hangs, I'm noticing that it gets stuck in the "PutEventArgs" function in AsyncCluster. Even before restarting the Aerospike server, I'm getting quite a few timeouts with the exception "An asynchronous socket operation is already in progress using this SocketAsyncEventArgs instance."

BrianNichols commented 9 years ago

PutEventArgs() adds the SocketAsyncEventArgs instance back to a bounded queue. The bounded queue is only initialized once at cluster creation. The only way PutEventArgs() could block is if the same SocketAsyncEventArgs was added back twice. This is consistent with the "An asynchronous socket operation is already in progress using this SocketAsyncEventArgs instance." message.

I recommend disabling retries see if that fixes the problem.

    WritePolicy policy = new WritePolicy()
    policy.maxRetries = 0;
    policy.sleepBetweenRetries = 0;

    AsyncClient ac = new AsyncClient("127.0.0.1", 3000);
    for (int i = 0; true; i++)
    {
        try
        {
            Key key = new Key("test", "test", i);
            ac.Put(policy, new AWriter(), key, new Bin[] { new Bin("value", "value-" + i) });
            ac.Get(policy, new ARecordListener(), key);
            System.Threading.Thread.Sleep(100);
        }
        catch (Exception ex)
        {
            Console.WriteLine("Exception: " + ex.Message);
        }
    }
BrianNichols commented 9 years ago

Which environment are you running this program? Are you using 32-bit or 64-bit OS? Are you using mono?

mchecca commented 9 years ago

Thanks for the suggestions, I'll try them out.

My environment is a 64-bit Windows client and a 64 bit Cent OS server running in VirtualBox (the Vagrant image).

But I've also got it to reproduce using Mono for the client and with a Ubuntu server instead of Cent OS, but neither seemed to have a significant impact. On Mar 26, 2015 8:28 PM, "Brian Nichols" notifications@github.com wrote:

Some questions on your environment.

Which environment are you running this program? Are you using 32-bit or 64-bit OS? Are you using mono?

— Reply to this email directly or view it on GitHub https://github.com/aerospike/aerospike-client-csharp/issues/5#issuecomment-86769526 .

BrianNichols commented 9 years ago

Thanks. I'd concentrate on 64-bit Windows client and 64-bit Centos server since that is what I'm using.

mchecca commented 9 years ago

I can't get Aerospike to reproduce the behavior reliably. It seems like every time I run it, I get something different. I'm closing this for now, I'll keep trying and see if I can better understand what is going on.