Netflix / astyanax

Cassandra Java Client
Apache License 2.0
1.04k stars 355 forks source link

Connection attempts although host is marked as down #552

Open tsteinmaurer opened 9 years ago

tsteinmaurer commented 9 years ago

Hello,

we are using 1.56.48 with our own connection pool monitor implementation plugged in.

When a Cassandra node goes down Astyanax is able to detected that and our onHostDown implementation of the ConnectionPoolMonitor interface is executed. We are putting a log and increment a counter there. But after the host has been marked as down we also see still connection attempts to that host which doesn't make sense as long as the host is down.

Any ideas?

Thanks, Thomas

opuneet commented 9 years ago

Hey Thomas,

That just sounds like a bug. Furthermore, you aren't on the latest version of Astyanax. It is 2.0.2. Can you please upgrade and verify that you still see this issue.

Furthermore, I'd like to see how you mark a host as down, and I want to see your query. This will help me reproduce the issue.

Thanks.

tsteinmaurer commented 9 years ago

Hi!

Thanks for your reply. Although initially I thought it is a bug, perhaps it is as designed and our connection pool monitor is simply too noisy from a logging POV. I will try with 2.0.2 later.

It is rather simple to reproduce with 1.56.48 here. I also haven't found something in the Git History for master in that area, thus I believe this will happen with 2.0.2 as well.

I'm testing against a 4 node Cassandra 1.2.15 cluster. ROUND_ROBIN connection pool type and RING_DESCRIBE as discovery type. Astyanax will mark a host down automatically when I simply stop the Cassandra process on a node or when I disable thrift for that server. And this is logged by the onHostDown event/implementation of the ConnectionPoolMonitor interface. Once the host is marked as down, I do see that no further Astyanax operations/requests are targeted to the down node, so this looks fine, but there are still connection attempts on the down node from a cyclic background task, which obviously fails as long as the node is down, but this internally will call the incConnectionCreateFailed(Host host, Exception reason) method of the ConnectionPoolMonitor interface. Perhaps this normal business and I simply should be more sensitive regarding logging.

In a simple example, you could try to use the Slf4jConnectionPoolMonitorImpl class for the connection pool monitor, then take a node down and there should be a regular log output by the incConnectionCreateFailed method as long as the node is down.

tsteinmaurer commented 9 years ago

Btw, does Astyanax at some point in time give up trying to re-connect to a particular host, e.g. when it is marked as down for a specific time-frame or due to other ideally configurable conditions?

Thanks again. Much appreciated.