Open rishi2808-ds opened 1 month ago
@rishi2808-ds Thanks for posting this in detail. I am seeing a similar issue in my production environment as well. The error gets cached causing all subsequent requests to fail for the container.
Hi @rishi2808-ds - thanks for reaching out and providing the detailed explanation.
To better understand and investigate this issue, it would be helpful if you could provide a minimal reproducible code snippet or example. Having a concise code sample that demonstrates the problem you're encountering will assist me in reproducing and analyzing the issue more effectively on my end.
While the information you've provided so far is valuable, having a minimal reproducible code example will allow me to isolate the problem and potentially uncover any nuances or edge cases related to the credential fetching and caching behavior you've described.
Please feel free to share a simplified version of your code, ensuring that it captures the essence of the issue without any unnecessary complexities. This will streamline the investigation process and enable us to collaborate more efficiently in identifying the root cause and potential solutions.
Best, John
To reproduce this issue locally, remove the credentials from the ~/.aws/credentials file, forcing the SDK to fall back to fromInstanceMetadata method for obtaining credentials, mirroring the same behaviour as on remote environment.
Explicitly throw a TimeoutError in the httpRequest function located in node_modules/@aws-sdk/credential-provider-imds/dist/cjs/remoteProvider/httpRequest.js. When this function is called for the first time, it triggers and throws a TimeoutError, which then gets cached in Memoize. On subsequent calls, the function is not invoked again; instead, the cached TimeoutError is returned.
Below is the code changes we have done in httpRequest.js file.
Object.defineProperty(exports, "__esModule", { value: true });
exports.httpRequest = void 0;
const property_provider_1 = require("@aws-sdk/property-provider");
const buffer_1 = require("buffer");
const http_1 = require("http");
var flag1 = false;
/**
* @internal
*/
function httpRequest(options) {
return new Promise((resolve, reject) => {
if (!flag1) {
console.log("http----->0")
flag1 = true
reject(new Error("TimeoutError1"));
}
const req = http_1.request({ method: "GET", ...options });
console.log("http----->1")
req.on("error", (err) => {
reject(Object.assign(new property_provider_1.ProviderError("Unable to connect to instance metadata service"), err));
});
req.on("timeout", () => {
reject(new Error("TimeoutError"));
});
req.on("response", (res) => {
const { statusCode = 400 } = res;
if (statusCode < 200 || 300 <= statusCode) {
reject(Object.assign(new property_provider_1.ProviderError("Error response received from instance metadata service"), { statusCode }));
}
const chunks = [];
res.on("data", (chunk) => {
chunks.push(chunk);
});
res.on("end", () => {
resolve(buffer_1.Buffer.concat(chunks));
});
});
req.end();
});
}
exports.httpRequest = httpRequest;
`
Checkboxes for prior research
Describe the bug
The issue arises because when the AWS credentials expire, the AWS SDK makes a call to fetch new credentials and cache them using the memoize method. If this fetch operation fails and results in a TimeoutError, the AWS SDK’s memoize method caches this error. Consequently, subsequent calls retrieve the TimeoutError from the cache instead of attempting to fetch new credentials from AWS.
To reproduce this issue locally, we removed the credentials from the ~/.aws/credentials file, forcing the SDK to fall back to fromInstanceMetadata method for obtaining credentials, mirroring the same behaviour as on remote environment.
We then explicitly threw an error within the AWS SDK and observed that while the first attempt to fetch credentials triggered an API call to the Instance Metadata Service, subsequent attempts retrieved the error from the cache instead of making fresh API calls to Instance Metadata Service.
Below is the screenshot of snapshot of the values of the hasResult and result variables in the memoize method verifying that the TimeoutError is indeed being cached.
Additional logs added.
First time when its get called we can see the added logs.
Subsequent calls do not show added logs in the AWS SDK, indicating that no new API calls are being made. Instead, we continue to see TimeoutError logs, which means the error is being retrieved from the cache.
SDK version number
aws-sdk/client-s3@3.11.0
Which JavaScript Runtime is this issue in?
Node.js
Details of the browser/Node.js/ReactNative version
v20.10.0
Reproduction Steps
To reproduce issue locally,
Explicitly throw a TimeoutError in the httpRequest function located in node_modules/@aws-sdk/credential-provider-imds/dist/cjs/remoteProvider/httpRequest.js. When this function is called for the first time, it triggers and throws a TimeoutError, which then gets cached in Memoize. On subsequent calls, the function is not invoked again; instead, the cached TimeoutError is returned.
Observed Behavior
We then explicitly threw an error within the AWS SDK and observed that while the first attempt to fetch credentials triggered an API call to the Instance Metadata Service, subsequent attempts retrieved the error from the cache instead of making fresh API calls to Instance Metadata Service.
Expected Behavior
Subsequent calls should call to Instance Metadata Service to fetch credentials when TimeoutError is been stored in cache.
Possible Solution
Subsequent calls should call to Instance Metadata Service to fetch credentials when TimeoutError is been stored in cache.
Additional Information/Context
No response