Open eaglerainbow opened 2 years ago
Double maintaining additional observations from another post:
cf-client-java:5.6.0
cf-client-java:4.16.0
. There, the refresh token requests are being sent to the UAA server, the second one is simulated to be failed. Logs are available here.Apologizes for the slow response & thank you much for the repro sample/steps. I haven't been able to the time deserved to dig into this yet, but it's on my TODO list.
Thanks for letting us know! Be sure that I totally understand that.
@eaglerainbow I have been looking at this. It's quite involved to reproduce, a lot of code to dig through, and I haven't been able to do that and reproduce it yet.
I will say I'm not totally surprised there's an issue though. There is a single entry in the concurrent hashmap that stores the access and refresh token per ConnectionContext. If you're sending a large volume of requests and the token expires, it's entirely possible that multiple requests will get a 401 at the same time and all need to refresh the token. I suppose this could possibly cause issues, either multiple updates to the hashmap (i.e. first response is valid and updates hashmap entry, second response is not valid for some reason and overwrites the valid entry, not it's failed) or possibly something related to how that response is cached. There have been cases where a bad response is cached, so that could be happening here.
A few questions...
In your test logs, I see this request responding OK
22:25:18.796 [reactor-http-nio-6] DEBUG cloudfoundry-client.response - 200 /login/oauth/token (6 ms)
but it logs this message which looks like an anomalous situation.
22:25:18.801 [pool-2-thread-3] WARN org.cloudfoundry.promregator.cfaccessor.AccessKeyRefreshMassTest$LoginHttpHandler - Request with invalid refresh token provided; responding with error
What is the rationale behind this part of your test code? What is it doing with the regex there and returning a 401 in that case. Also, is there any chance you could enable wire tracing for your demo and attach updated logs? Note there was a bug with enabling wire trace if 5.x, you need to pull down 5.8.0 to properly enable wire tracing.
Thanks
What is the rationale behind this part of your test code? What is it doing with the regex there and returning a 401 in that case.
Not having looked at the code in detail for a while and hence need to remember this by heart (sorry, yet too much other things to do right now - will look at it in detail as soon as possible), IIRC, UAA has "exactly one" valid refresh token at a time. That is to say, if you are sending a refresh token request to UAA and provide a deprecated refresh token (e.g. the same one as the old request), then you will get back a 401. That is to say:
If you have stored the current refresh token "x", send two requests at the same time to UAA containing that token "x", then UAA will declare one of the requests to be "the first one", issue a new JWT and provide a new refresh token "y". From that point on, only refresh token "y" is considered valid. The second request, which still has token "x" in the request body then will get a 401, as refresh token "x" is outdated.
IIRC I simulate that behavior with this harsh regular expression above in method respondToGrantTypeRefreshToken
.
Does this make sense?
BTW: IIRC that's also the reason why the method in our test code is synchronized
...
Gotcha, yes. That makes perfect sense.
Given that, I suspect what's happening is related to caching. If multiple requests trigger a refresh and that in turn triggers multiple refresh requests, only one of those refresh results is going to be valid. If the wrong one is cached, you're going to be stuck with that invalid cached result. Hence, the "no further requests possible", at least until the cache expires. Caching would also explain why there are no logs, since if the bad request is cached it won't actually send a new request and you won't get a corresponding log entry.
I suspect if you lower the caching duration you should see a difference in behavior. You can't really set it super low in production, but in your test scenario, you could try setting it to like 5ms and see if you still see that problem. I suspect that you'll start to see more requests go out in that case.
If we can confirm the behavior is related to caching, that's at least a step in the right direction.
If we can confirm the behavior is related to caching, that's at least a step in the right direction.
I finally found an hour to have a look into this again. And your guessing seems to be totally right:
I have recorded two sessions of my coding above (adjusted variant below for your reference): one with having set cacheDuration
of the connectionContext to 5ms, and a second one without that parameter. For both executions I made stuff as chatty as I could find it (or please tell me, if there is anything more that I can do to make it even more verbose).
Here's the log file of it: 20220619-5mscachetest.txt
In essence: With 5ms cacheDuration, the problem is no longer reproducible with my test coding. With the default cacheDuration, I can reproduce it in 100% of cases. => We are chasing a caching issue.
Sidenote: This would be at least some mitigation strategy for productive users, e.g. reducing the cache duration blocks our application (promregator) for a reduced timeframe. The flipside of the coin is that this will also pester the UAA server: The lower we put the value, the more aggressive the client will be and hence the higher the load at the server. So, we shouldn't overdo it...
Code has only changed in minor.
package org.cloudfoundry.promregator.cfaccessor;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.io.PrintWriter;
import java.lang.ProcessBuilder.Redirect;
import java.net.InetSocketAddress;
import java.nio.charset.Charset;
import java.security.KeyStore;
import java.time.Duration;
import java.time.Instant;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.concurrent.Executors;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import javax.net.ssl.KeyManagerFactory;
import javax.net.ssl.SSLContext;
import javax.net.ssl.TrustManagerFactory;
import org.cloudfoundry.client.v2.domains.Domains;
import org.cloudfoundry.client.v2.domains.ListDomainsRequest;
import org.cloudfoundry.client.v2.domains.ListDomainsResponse;
import org.cloudfoundry.reactor.DefaultConnectionContext;
import org.cloudfoundry.reactor.client.ReactorCloudFoundryClient;
import org.cloudfoundry.reactor.tokenprovider.PasswordGrantTokenProvider;
import org.junit.jupiter.api.Test;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.sun.net.httpserver.HttpExchange;
import com.sun.net.httpserver.HttpHandler;
import com.sun.net.httpserver.HttpsConfigurator;
import com.sun.net.httpserver.HttpsServer;
import com.sun.org.apache.xerces.internal.impl.dv.util.Base64;
import ch.qos.logback.classic.Level;
import reactor.core.publisher.Mono;
import sun.net.httpserver.HttpsServerImpl;
public class AccessKeyRefreshMassTest {
private static final File KEYSTORE_FILE = new File(System.getProperty("java.io.tmpdir"), "test.jks");
private static final String KEYSTORE_PASSWORD = "pass_store";
private static final String KEY_PASSWORD = "pass_key";
/**
* Generates a new self-signed certificate in /tmp/test.jks, if it does not
* already exist. see also
* http://rememberjava.com/http/2017/04/29/simple_https_server.html
*/
static void generateCertificate() throws Exception {
File keytool = new File(System.getProperty("java.home"), "bin/keytool");
String[] genkeyCmd = new String[] { keytool.toString(), "-genkey", "-keyalg", "RSA", "-alias", "some_alias",
"-validity", "365", "-keysize", "2048", "-dname", "cn=John_Doe,ou=TestOrgUnit,o=TestOrg,c=US",
"-keystore", KEYSTORE_FILE.getAbsolutePath(), "-storepass", KEYSTORE_PASSWORD, "-keypass",
KEY_PASSWORD, "-storetype", "JKS" };
System.out.println(String.join(" ", genkeyCmd));
ProcessBuilder processBuilder = new ProcessBuilder(genkeyCmd);
processBuilder.redirectErrorStream(true);
processBuilder.redirectOutput(Redirect.INHERIT);
processBuilder.redirectError(Redirect.INHERIT);
Process exec = processBuilder.start();
exec.waitFor();
System.out.println("Exit value: " + exec.exitValue());
}
static SSLContext getSslContext() throws Exception {
KeyStore ks = KeyStore.getInstance("JKS");
ks.load(new FileInputStream(KEYSTORE_FILE), KEYSTORE_PASSWORD.toCharArray());
KeyManagerFactory kmf = KeyManagerFactory.getInstance("SunX509");
kmf.init(ks, KEY_PASSWORD.toCharArray());
TrustManagerFactory tmf = TrustManagerFactory.getInstance("SunX509");
tmf.init(ks);
SSLContext sslContext = SSLContext.getInstance("TLS");
sslContext.init(kmf.getKeyManagers(), tmf.getTrustManagers(), null);
return sslContext;
}
private class InfoHttpHandler implements HttpHandler {
private final Logger log = LoggerFactory.getLogger(InfoHttpHandler.class);
@Override
public void handle(HttpExchange he) throws IOException {
log.info("Handling info request");
String response = "{\n" + " \"api_version\": \"2.172.0\",\n"
+ " \"app_ssh_endpoint\": \"localhost:2222\",\n"
+ " \"app_ssh_host_key_fingerprint\": \"c7:1f:89:2a:62:3b:78:a9:08:c9:33:81:fb:39:26:da\",\n"
+ " \"app_ssh_oauth_client\": \"ssh-proxy\",\n"
+ " \"authorization_endpoint\": \"https://localhost:9003/login\",\n" + " \"build\": \"v1.2.3\",\n"
+ " \"description\": \"Dummy Foundry\",\n"
+ " \"doppler_logging_endpoint\": \"wss://localhost:1234\",\n" + " \"min_cli_version\": null,\n"
+ " \"min_recommended_cli_version\": null,\n" + " \"name\": \"Dummy\",\n"
+ " \"osbapi_version\": \"2.15\",\n"
+ " \"support\": \"http://localhost:1234/supportinfo\",\n"
+ " \"token_endpoint\": \"https://localhost:9004\",\n" + " \"version\": 0\n" + "}";
ByteArrayOutputStream baos = new ByteArrayOutputStream(1024);
PrintWriter pw = new PrintWriter(baos);
pw.print(response);
pw.flush();
he.sendResponseHeaders(200, baos.size());
OutputStream responseBody = he.getResponseBody();
responseBody.write(baos.toByteArray());
responseBody.flush();
}
}
private class RootHttpHandler implements HttpHandler {
private final Logger log = LoggerFactory.getLogger(RootHttpHandler.class);
@Override
public void handle(HttpExchange he) throws IOException {
log.info("Handling root request against path {}", he.getRequestURI());
String response = "{\n" +
" \"links\": {\n" +
" \"self\": {\n" +
" \"href\": \"https://localhost:9002\"\n" +
" },\n" +
" \"bits_service\": null,\n" +
" \"cloud_controller_v2\": {\n" +
" \"href\": \"https://localhost:9002/v2\",\n" +
" \"meta\": {\n" +
" \"version\": \"2.172.0\"\n" +
" }\n" +
" },\n" +
" \"cloud_controller_v3\": {\n" +
" \"href\": \"https://localhost:9002/v3\",\n" +
" \"meta\": {\n" +
" \"version\": \"3.107.0\"\n" +
" }\n" +
" },\n" +
" \"network_policy_v0\": {\n" +
" \"href\": \"https://localhost:9002/networking/v0/external\"\n" +
" },\n" +
" \"network_policy_v1\": {\n" +
" \"href\": \"https://localhost:9002/networking/v1/external\"\n" +
" },\n" +
" \"login\": {\n" +
" \"href\": \"https://localhost:9003/login\"\n" +
" },\n" +
" \"uaa\": {\n" +
" \"href\": \"https://localhost:9004\"\n" +
" },\n" +
" \"credhub\": null,\n" +
" \"routing\": null,\n" +
" \"logging\": {\n" +
" \"href\": \"wss://localhost:1234\"\n" +
" },\n" +
" \"log_cache\": {\n" +
" \"href\": \"https://localhost:9005\"\n" +
" },\n" +
" \"log_stream\": {\n" +
" \"href\": \"https://localhost:9006\"\n" +
" },\n" +
" \"app_ssh\": {\n" +
" \"href\": \"localhost:2222\",\n" +
" \"meta\": {\n" +
" \"host_key_fingerprint\": \"c7:1f:89:2a:62:3b:78:a9:08:c9:33:81:fb:39:26:da\",\n" +
" \"oauth_client\": \"ssh-proxy\"\n" +
" }\n" +
" }\n" +
" }\n" +
"}\n" +
"";
ByteArrayOutputStream baos = new ByteArrayOutputStream(1024);
PrintWriter pw = new PrintWriter(baos);
pw.print(response);
pw.flush();
he.sendResponseHeaders(200, baos.size());
OutputStream responseBody = he.getResponseBody();
responseBody.write(baos.toByteArray());
responseBody.flush();
}
}
public class DomainsHttpHandler implements HttpHandler {
private final Logger log = LoggerFactory.getLogger(DomainsHttpHandler.class);
private final Pattern extractJwtPayload = Pattern.compile(".+\\.(.+)\\..+");
@Override
public void handle(HttpExchange he) throws IOException {
log.info("Handling domains request");
List<String> authorizationList = he.getRequestHeaders().get("Authorization");
if (authorizationList.isEmpty()) {
he.sendResponseHeaders(401, 0);
he.getResponseBody().flush();
return;
}
if (authorizationList.size() > 1) {
log.error("Multiple authentication headers received");
he.sendResponseHeaders(400, 0);
he.getResponseBody().flush();
return;
}
String authorizationValue = authorizationList.get(0);
log.info("Authorization Request Header Value: "+authorizationValue);
if (!authorizationValue.startsWith("Bearer ")) {
he.sendResponseHeaders(401, 0);
he.getResponseBody().flush();
return;
}
String jwt = authorizationValue.substring(8);
log.info("Extracted Jwt: "+jwt);
Matcher m = extractJwtPayload.matcher(jwt);
if (!m.find()) {
he.sendResponseHeaders(401, 0);
he.getResponseBody().flush();
return;
}
String jwtPayload = m.group(1);
Map<String, Object> claims = new ObjectMapper().readValue(Base64.decode(jwtPayload), Map.class);
Integer exp = (Integer) claims.get("exp");
long now = Instant.now().toEpochMilli() / 1000;
log.info("JWT Expires at {}, now is {}", exp, now);
if (exp.longValue() < now) {
log.warn("JWT has expired: {} vs. {}(now)", exp, now);
he.sendResponseHeaders(401, 0);
he.getResponseBody().flush();
return;
}
log.info("JWT was still valid");
String response = "{\n" +
" \"next_url\": null,\n" +
" \"prev_url\": null,\n" +
" \"resources\": [\n" +
" {\n" +
" \"entity\": {\n" +
" \"internal\": false,\n" +
" \"name\": \"some.domain.example\",\n" +
" \"router_group_guid\": null,\n" +
" \"router_group_type\": null\n" +
" },\n" +
" \"metadata\": {\n" +
" \"created_at\": \"2021-10-20T19:21:39Z\",\n" +
" \"guid\": \"42049093-13e9-4520-80a6-2d6fea6542bc\",\n" +
" \"updated_at\": \"2021-09-16T20:40:47Z\",\n" +
" \"url\": \"/v2/domains/42049093-13e9-4520-80a6-2d6fea6542bc\"\n" +
" }\n" +
" }\n" +
" ],\n" +
" \"total_pages\": 1,\n" +
" \"total_results\": 1\n" +
"}";
ByteArrayOutputStream baos = new ByteArrayOutputStream(1024);
PrintWriter pw = new PrintWriter(baos);
pw.print(response);
pw.flush();
he.sendResponseHeaders(200, baos.size());
OutputStream responseBody = he.getResponseBody();
responseBody.write(baos.toByteArray());
responseBody.flush();
}
}
public class APIMockServer {
private HttpsServer server;
protected int port = 9002;
private InfoHttpHandler infoHandler;
private RootHttpHandler rootHttpHandler;
private DomainsHttpHandler domainsHttpHandler;
public APIMockServer() {
}
public void start(SSLContext sslContext) throws IOException {
InetSocketAddress bindAddress = new InetSocketAddress("127.0.0.1", this.port);
this.server = HttpsServerImpl.create(bindAddress, 0);
this.server.createContext("/", this.getRootHttpHandler());
this.server.createContext("/v2/info", this.getInfoHttpHandler());
this.server.createContext("/v2/domains", this.getDomainsHttpHandler());
this.server.setHttpsConfigurator(new HttpsConfigurator(sslContext));
this.server.setExecutor(Executors.newFixedThreadPool(15));
this.server.start();
}
private HttpHandler getRootHttpHandler() {
if (this.rootHttpHandler == null) {
this.rootHttpHandler = new RootHttpHandler();
}
return this.rootHttpHandler;
}
private InfoHttpHandler getInfoHttpHandler() {
if (this.infoHandler != null) {
return this.infoHandler;
}
this.infoHandler = new InfoHttpHandler();
return this.infoHandler;
}
private HttpHandler getDomainsHttpHandler() {
if (this.domainsHttpHandler == null) {
this.domainsHttpHandler = new DomainsHttpHandler();
}
return this.domainsHttpHandler;
}
public void stop() {
this.server.stop(1);
}
}
public class LoginHttpHandler implements HttpHandler {
private final Logger log = LoggerFactory.getLogger(LoginHttpHandler.class);
private final Pattern patternGrantTypePassword = Pattern.compile("&grant_type=password");
private final Pattern patternGrantTypeRefreshToken = Pattern.compile("&grant_type=refresh_token");
private final Pattern patternRefreshToken = Pattern.compile("&refresh_token=([0-9]+)");
private String currentValidRefreshToken;
@Override
public void handle(HttpExchange he) throws IOException {
log.info("Login request received");
log.info("RequestHeaders:");
for (Entry<String, List<String>> entry : he.getRequestHeaders().entrySet()) {
log.info(String.format("%s: %s", entry.getKey(), entry.getValue().toString()));
}
log.info("RequestBody:");
String requestBody = new String(he.getRequestBody().readAllBytes());
log.info(requestBody);
if (patternGrantTypePassword.matcher(requestBody).find()) {
provideNewAccessAndRefreshToken(he);
} else if (patternGrantTypeRefreshToken.matcher(requestBody).find()) {
respondToGrantTypeRefreshToken(he, requestBody);
} else {
he.sendResponseHeaders(401, 0);
he.getResponseBody().flush();
}
}
private synchronized void provideNewAccessAndRefreshToken(HttpExchange he) throws IOException {
Instant now = Instant.now();
String headerString = "{}";
String payloadString = String.format("{\"iat\":%d, \"exp\":%d}", now.toEpochMilli()/1000, now.plus(Duration.ofSeconds(5)).toEpochMilli()/1000);
String jwt = String.format("%s.%s.invalidsignature", Base64.encode(headerString.getBytes(Charset.defaultCharset())), Base64.encode(payloadString.getBytes(Charset.defaultCharset())));
log.info("Sending jwt: "+jwt);
String refreshToken = Instant.now().toEpochMilli()+"";
String response = String.format("{\"access_token\":\"%s\",\"expires_in\": 5, \"refresh_token\":\"%s\", \"scope\": \"openid network.write uaa.user cloud_controller.read password.write cloud_controller.write network.admin\", \"token_type\": \"Bearer\"}",
jwt, refreshToken);
log.info("Sending response body: {}", response);
this.currentValidRefreshToken = refreshToken;
ByteArrayOutputStream baos = new ByteArrayOutputStream(1024);
PrintWriter pw = new PrintWriter(baos);
pw.print(response);
pw.flush();
he.sendResponseHeaders(200, baos.size());
OutputStream responseBody = he.getResponseBody();
responseBody.write(baos.toByteArray());
responseBody.flush();
}
private synchronized void respondToGrantTypeRefreshToken(HttpExchange he, String requestBody) throws IOException {
Matcher m = patternRefreshToken.matcher(requestBody);
if (!m.find()) {
he.sendResponseHeaders(401, 0);
he.getResponseBody().flush();
return;
}
String requestRefreshToken = m.group(1);
if (currentValidRefreshToken.equals(requestRefreshToken)) {
log.info("Request with current refresh token; issuing new JWT");
provideNewAccessAndRefreshToken(he);
return;
}
log.warn("Request with invalid refresh token provided; responding with error");
he.sendResponseHeaders(401, 0);
he.getResponseBody().flush();
}
}
public class LoginMockServer {
private HttpsServer server;
protected int port = 9003;
private LoginHttpHandler loginHttpHandler;
public LoginMockServer() {
}
public void start(SSLContext sslContext) throws IOException {
InetSocketAddress bindAddress = new InetSocketAddress("127.0.0.1", this.port);
this.server = HttpsServerImpl.create(bindAddress, 0);
this.server.createContext("/login/oauth/token", this.getLoginHttpHandler());
this.server.setHttpsConfigurator(new HttpsConfigurator(sslContext));
this.server.setExecutor(Executors.newFixedThreadPool(3));
this.server.start();
}
private HttpHandler getLoginHttpHandler() {
if (this.loginHttpHandler == null) {
this.loginHttpHandler = new LoginHttpHandler();
}
return this.loginHttpHandler;
}
public void stop() {
this.server.stop(1);
}
}
@Test
public void runTest() throws Exception {
//System.setProperty("javax.net.debug", "all");
ch.qos.logback.classic.Logger requestLogger = (ch.qos.logback.classic.Logger) LoggerFactory.getLogger("cloudfoundry-client.request");
requestLogger.setLevel(Level.DEBUG);
ch.qos.logback.classic.Logger responseLogger = (ch.qos.logback.classic.Logger) LoggerFactory.getLogger("cloudfoundry-client.response");
responseLogger.setLevel(Level.DEBUG);
ch.qos.logback.classic.Logger wireLogger = (ch.qos.logback.classic.Logger) LoggerFactory.getLogger("cloudfoundry-client.wire");
wireLogger.setLevel(Level.DEBUG);
ch.qos.logback.classic.Logger tokenLogger = (ch.qos.logback.classic.Logger) LoggerFactory.getLogger("cloudfoundry-client.token");
tokenLogger.setLevel(Level.DEBUG);
ch.qos.logback.classic.Logger rootLogger = (ch.qos.logback.classic.Logger) LoggerFactory.getLogger("cloudfoundry-client");
rootLogger.setLevel(Level.DEBUG);
generateCertificate();
SSLContext sslContext = getSslContext();
APIMockServer apiMockServer = new APIMockServer();
apiMockServer.start(sslContext);
LoginMockServer loginMockServer = new LoginMockServer();
loginMockServer.start(sslContext);
final Logger log = LoggerFactory.getLogger(this.getClass());
try {
DefaultConnectionContext connectionContext = DefaultConnectionContext.builder().apiHost("localhost")
.port(9002).skipSslValidation(true).build();
// cacheDuration(Duration.ofMillis(5))
PasswordGrantTokenProvider tokenProvider = PasswordGrantTokenProvider.builder().password("pw")
.username("username").build();
ReactorCloudFoundryClient cloudFoundryClient = ReactorCloudFoundryClient.builder()
.connectionContext(connectionContext).tokenProvider(tokenProvider).build();
Domains domains = cloudFoundryClient.domains();
final ListDomainsRequest request = ListDomainsRequest.builder().build();
Mono<ListDomainsResponse> listDomainsInitialMono = domains.list(request);
log.info("Stage 1: Sending initial domain list request");
listDomainsInitialMono.block(Duration.ofSeconds(50));
log.info("Waiting 7 seconds to ensure that the JWT has expired");
Thread.sleep(7000);
log.info("Stage 2: Sending serial requests");
Mono<ListDomainsResponse> listDomains1Mono = domains.list(request);
Mono<ListDomainsResponse> listDomains2Mono = domains.list(request);
log.info("Sending first request");
listDomains1Mono.block(Duration.ofSeconds(5));
log.info("Sending second request");
listDomains2Mono.block(Duration.ofSeconds(5));
log.info("Waiting 7 seconds to ensure that the JWT has expired");
Thread.sleep(7000);
log.info("Stage 3: Sending parallel requests");
Mono<ListDomainsResponse> listDomains3Mono = domains.list(request);
Mono<ListDomainsResponse> listDomains4Mono = domains.list(request);
listDomains3Mono.doOnError(e -> log.error("Received Exception for mono 1", e)).subscribe();
listDomains4Mono.doOnError(e -> log.error("Received Exception for mono 2", e)).subscribe();
log.info("Waiting 3 seconds to prevent any bad overlap");
Thread.sleep(3000);
log.info("Stage 4: Sending another isolated request");
Mono<ListDomainsResponse> listDomains5Mono = domains.list(request);
listDomains5Mono.block(Duration.ofSeconds(5));
} finally {
loginMockServer.stop();
apiMockServer.stop();
}
}
}
Thank you for confirming this! We will take a look at the best way to handle this now that we've narrowed it down to caching
@pivotal-david-osullivan any news for me? can I still support you in any way?
or is a workaround known?
Is it correct, that the cache duration has to be a little less than the access token validity to prevent the client from starting the refresh flow ?
That means a cache duration of 15 mins will work fine when the access token is valid for 20 mins.
Is it correct, that the cache duration has to be a little less than the access token validity to prevent the client from starting the refresh flow ?
That means a cache duration of 15 mins will work fine when the access token is valid for 20 mins.
I am a little unsure whether this is a general question (without any association to the original inquiry) or whether you propose this to be a possible resolution of it.
If it is the first, then yes: You should fetch a new JWT a little earlier before the validity expiry effectively. I would expect that a couple of minutes/seconds would already be sufficient for that. As far as I understand things, cf-java-client
does not use this approach, but waits for the first UNAUTHORIZED
HTTP status code to come back before refreshing the access token. This is also an acceptable approach, but you risk to run into troubles, for example if an IDS thinks that the caller might be an attacker.
If it is the latter, then no: The problem discussed here in this thread is based on the fact that other requests may also be sent in parallel while refreshing of the access token still is pending. The other request will then also issue a corresponding HTTP request to the UAA server, using the same refresh token. The second request then is rejected by UAA server (because the refresh token got already replaced by a new one - and the old one has become invalid). Whilst the first (successful) request was stored in the cache, the second (failing) request overwrites the cache again - and cf-java-client
thinks (from this time on) that neither the access token nor the refresh token was valid anymore. All subsequent "business" requests therefore fail as well.
With triggering the refresh token flow earlier, you don't fix anything here: Still, if there is second request coming while a refresh token flow is pending, the same issue will happen (just a little earlier).
To fix this issue conceptually, I think, we somehow need to "lock" the Operator/Operations from sending subsequent requests until the refresh token flow is completed (either successful or with a failure). (just my $0.02 on a potential solution).
At https://github.com/promregator/promregator/issues/222 we are again tracking down a very ugly issue with JWT handling at the UAA using
cf-java-client
. We are currently on 5.7.0. I believe to have it tracked down around the area of https://github.com/cloudfoundry/cf-java-client/blob/00faecff356e082b7ec508fba4bc35510f162121/cloudfoundry-client-reactor/src/main/java/org/cloudfoundry/reactor/tokenprovider/AbstractUaaTokenProvider.java#L268 In a nutshell:It appears that if multiple cf api requests are executed in parallel at the moment after the current
access_token
has expired, there is a race condition for updating theaccess_token
/refresh_token
. The issue is not logged/loggable.The creator of the initial issue has the situation that its access token validity is for only 20 minutes. However, several hundreds of requests are being sent with a short cadence - and at some point in time (by chance), there are two requests in the pipeline in parallel when the
access_token
has expired. Both of them need to be refreshed through thegrant_type=refresh_token
flow. In production, it takes 6h+ to reproduce the issue.I believe to have been able to reproduce the issue on my local machine within a couple of seconds using the following (very ugly and insecure - don't ever do this in production!) "unit test":
The coding above shows the following behavior:
/
endpoint and the/v2/info
endpoint of the api-server but never try to trigger any refresh_token flow. After 2,1s they appear to fail with aresponse_incomplete
from netty, generating an artificial (?) 401 response.reactor.netty.http.client.PrematureCloseException: Connection prematurely closed DURING response
exception onorg.cloudfoundry.reactor.tokenprovider.AbstractUaaTokenProvider.token(AbstractUaaTokenProvider.java:261)
.For your convinience and analysis I have taken a snapshot of logs on one of my tests having set various "interesting" log levels to DEBUG. All appearing-to-be-sensitive-data there is fake/mocked/generated. No redaction was necessary. There was no running CF installation required for executing the test (mocked test in full isolation).
My expectation would have been that in the parallel case (3), both requests were put to a halt until a valid JWT is available at the client. In particular the behavior after the failure has ocurred (4) is very ugly for consuming application: Appearingly all requests that the consumer wants to send fail with an UaaException of which the consumer has no control. It appears that the
ConnectionContext
is in "some broken state".Could you have a look? How can we assist you with your analysis?