fabric8io / kubernetes-client

Java client for Kubernetes & OpenShift
http://fabric8.io
Apache License 2.0
3.38k stars 1.45k forks source link

Interceptor not invoked when connection refused #6143

Closed SamBarker closed 2 weeks ago

SamBarker commented 1 month ago

Describe the bug

I'm trying to migrate the Apache Flink kubernetes operator from using an OkHttp interceptor to using the new fabric8 http interceptor API. Which seems to have gone reasonably smoothly apart from one final test: https://github.com/apache/flink-kubernetes-operator/blob/aaf26aec8bdfedff0116c5897bd89b17b4454606/flink-kubernetes-operator/src/test/java/org/apache/flink/kubernetes/operator/metrics/KubernetesClientMetricsTest.java#L376

Which is failing because afterFailure is not invoked when there is no http response (I think). I've added a failing unit test in kubernetes-client which demonstrates what I expected to work.

So I'm wondering if the test I've written should work? Or have I missed something in how the API is expected to be used

Fabric8 Kubernetes Client version

SNAPSHOT

Steps to reproduce

  1. checkout https://github.com/SamBarker/kubernetes-client/tree/afterFailureInvocation
  2. run io.fabric8.kubernetes.client.http.AbstractInterceptorTest#afterHttpFailureRemoteOffline (I've been testing with the OkHttp client but I don't think it matters)
  3. see test fail

Expected behavior

The test passes

Runtime

Kubernetes (vanilla)

Kubernetes API Server version

1.25.3@latest

Environment

macOS

Fabric8 Kubernetes Client Logs

No response

Additional context

No response

manusa commented 1 month ago

/cc @shawkins

manusa commented 1 month ago

AFAIR the interceptors (Fabric8) are applicable once the connection is performed. For the other cases I kind of remember we had other callbacks, but I need to check.

shawkins commented 1 month ago

AFAIR the interceptors (Fabric8) are applicable once the connection is performed. For the other cases I kind of remember we had other callbacks, but I need to check.

That is correct, the fabric8 interceptors make this callback based upon the response code. The retry logic will look for IOExceptions, but does not invoke the interceptors.

I don't think we had a usecase where the handling of this situation needed delegated to an interceptor - it was assumed that the retry logic was all that was needed. If I remember correctly some of the initial attempt at this logic was to have a retry interceptor, but in the end it was easier to separate the concepts.

So at the very least this would require some kind of an enhancement, or could your operator rely on just the built-in retry behavior?

SamBarker commented 1 month ago

I don't think we had a usecase where the handling of this situation needed delegated to an interceptor - it was assumed that the retry logic was all that was needed. If I remember correctly some of the initial attempt at this logic was to have a retry interceptor, but in the end it was easier to separate the concepts.

So at the very least this would require some kind of an enhancement, or could your operator rely on just the built-in retry behavior?

As far as I'm aware the retry behaviour is fine, the current OkHttp interceptor is purely for tracking metrics of failed and successful requests. The Fabric8 interceptor gives everything I think it needs for metrics when there is a HTTP response.

It feels a bit awkward that before is invoked for every retry but after|afterFailure is only invoked on HTTP response.

What is the lifecycle of request.id()? Is it regenerated on each retry or is it stable? If its stable I can possibly catch the missing failures...

SamBarker commented 1 month ago

I've confirmed that request.id is updated for each retry so I can't fudge it by detecting the repeated ID and incrementing the failed counter.

shawkins commented 1 month ago

I've confirmed that request.id is updated for each retry so I can't fudge it by detecting the repeated ID and incrementing the failed counter.

Sounds like you'll want an enhancement then for a new callback - it would be simplest if it were just a notification method. It seems like a bigger effort to allow the interceptor to handle retry logic as well.

SamBarker commented 1 month ago

Yeah I don't have a use case for doing anything to the behaviour of the retry.

I think it looks relatively straightforward would you be happy if I open a PR? Or is there a design type process to go through?

manusa commented 1 month ago

I think it looks relatively straightforward would you be happy if I open a PR?

Sure, I was going to provide a fix myself, but it's always better to have community contributions. A PR would be much appreciated :raised_hands:

Or is there a design type process to go through?

Maybe we can just discuss the implementation details, but it all seems quite straightforward.

As I see it:

As far as the Interceptor interface goes I think we just need to add the new callback method.

I'd suggest something like:

// Interceptor.java

  /**
   * Called in case the initial HTTP connection request fails after the request has been sent.
   * 
   * @param request the HTTP request.
   * @param failure the Java exception that caused the failure.
   */
  default void afterConnectionFailure(HttpRequest request, Throwable failure) {
  }

Then, implement the handling in the StandardHttpClient class within the consumeBytesOnce method:

// StandardHttpClient.java
    cf = cf.exceptionally(e -> {
      builder.getInterceptors().values().forEach(i -> i.afterConnectionFailure(effectiveRequest, e));
      if (e instanceof CompletionException) {
        throw (CompletionException) e;
      }
      throw new CompletionException(e);
    });

Thoughts?

SamBarker commented 1 month ago

Thanks @manusa that very much fits with what I was thinking.

Should this call back accept RequestTags as an argument? Or is that only relevant when using the builder interfaces? Similarly we don't need to try and distinguish between websocket and non-websocket connections, do we?

Should this interface be invoked when a websocket connection is broken? If not maybe the name should be afterInitialConnectionFailure.

manusa commented 1 month ago

Should this call back accept RequestTags as an argument? Or is that only relevant when using the builder interfaces?

The idea of the method I proposed is to be invoked as a consumer callback whenever there's a connection failure without providing any output or outcome.

I think RequestTags are only used in case you need to modify the request to create a new connection. IMO, I don't think it's necessary, but I don't really have a use-case for this.

In any case, it's always easy to add extra-arguments to an interface method, but hard to remove them once the API is public, so we can always add it in the future.

Similarly we don't need to try and distinguish between websocket and non-websocket connections, do we?

The current Interceptor definition allows to distinguish failures for non-websocket failures, but doesn't provide distinction for web-socket failures.

This method is supposed to be called before any connection negotiation happens, so I'm not sure why would the user need a distinction between one or the other.

Should this interface be invoked when a websocket connection is broken?

Yes.

I initially suggested to bind the call to the new interface method in the StandardHttpClient.consumeBytesOnce method, but this will only invoke it in case of connection failures for regular HTTP connections.

So it's probably better to bind it in the StandardHttpClient.retryWithExponentialBackoff method or in both the consumeBytesOnce and buildWebSocket methods.

If not maybe the name should be afterInitialConnectionFailure.

Websocket connections are negotiated through HTTP and then upgraded.

SamBarker commented 1 month ago

I've got the tests passing on https://github.com/fabric8io/kubernetes-client/pull/6144/files as discussed above. It's mostly done modulo a changelog entry (still in draft as I was working without agreement here :) )

I'll moving the failure handling to StandardHttpClient.retryWithExponentialBackoff and see how I get on.

I think RequestTags are only used in case you need to modify the request to create a new connection. IMO, I don't think it's necessary, but I don't really have a use-case for this.

Yeah that fits my understanding of how they are used but did want to clarify.

Websocket connections are negotiated through HTTP and then upgraded.

Yes, what I was getting at though is if its not invoked when an established connection fails then its only an initial connection handler and not handling all connection failures. That said the intention is to move it so this is moot.