aws / neptune-gremlin-client

A Java Gremlin client for Amazon Neptune that allows you to change the endpoints used by the client as it is running.
Apache License 2.0
5 stars 2 forks source link

endpointSelector provide instances in modifiying , upgrade status leading to connection failure #2

Closed birbalekka15 closed 9 months ago

birbalekka15 commented 9 months ago

Hi Team ,

While using the Endpoint Selector with ReadReplicas it provides instances that are in modifying state or in upgrading state. When neptune instances are in modifying state or in upgrade state the connection fails , hence the application code requests fail.

For filter isAvailable please provide the instance that is truly available to which connections can be established . NeptuneInstanceMetadata class needs to change to have different AVAILABLE_STATES options.

iansrobinson commented 9 months ago

Hi @birbalekka15

Thanks for the message.

This is correct behaviour. As noted in the documentation:

We say that isAvailable() indicates that an endpoint is likely available. There is no guarantee that the endpoint is actually available. For example, while an instance is upgrading, there can be short periods when the endpoint is not available. During the upgrade process, instances are sometimes restarted, and while this is happening the instance endpoint will not be available, even though the state is upgrading.

If you simply want an instance whose status is available, you can create a custom endpoint selector to filter instances where getStatus().equals("available"). The downside of this approach is that the client will not see instances whose status is not available, but whose endpoints are available. There are long periods when an instance is upgrading or being modified, where the endpoint is actually available: if you filter only on the available status, your client will miss those instances.

So you have a choice: filter only on the available status, accepting that you'll sometimes miss instances whose endpoints are actually available; or use isAvailable() with a backoff-and-retry strategy.

birbalekka15 commented 9 months ago

Hi @iansrobinson . Thank you for providing the details . We will go with the custom end point selector as our application needs to connect only to the instances that are ready to take traffic . Our client application is a very high throughput application almost catering 5k + TPS and response expectation of less than 500 ms with 0 failures. We cannot have the flexibility to connect to instances that is not ready to take the requests and connection fails ,which will lead to failures in upstream . With backoff and retry strategy the response time will be getting degraded.

In conclusion it would be good to have a feature for example ( EndpointsType.allReadReplicas , ( EndpointsType.AvailableReadReplicas , EndpointsType.modifyingReadReplicas .... )

Thanks you for the guidance and prompt help on this .