Closed alesk20 closed 4 months ago
Hi @alesk20,
Thanks for reaching out. The behavior is indeed odd. Since the return value from the await call to .send()
is hanging, it might be because the server did not close the connection and the SDK is still awaiting a response.
Without seeing more detailed logs it would be very difficult to diagnose. This could be due to different httphandler defaults with regard to connection management that you might need to change.
For example, in the v2 SDK the default timeout was 60 seconds, in v3 we use the defaults provided by node's http client which is 0:
requestTimeout
: The number of milliseconds a request can take before automatically being terminated. Defaults to 0, which disables the timeout. The number of milliseconds a request can take before being automatically terminated.
My guess is that this issue where the server hangs is also happening on v2, but the default behavior of the older version makes this more transparent. You might want to dial down the timeout to be more aggressive , perhaps at 60 seconds to align it with v2's behavior and see if this solves your issue.
Thanks, Ran~
Hi @RanVaknin,
thank you for the response. I'll try setting the timeout explicitly to 60 seconds, but it's still strange that all the messages get published with V2 sdk and instead with V3 sdk they don't get published when sns client hangs. Shouldn't also the messages handled with V2 sdk not being published if they reach the default 60 secs timeout? What I observe is that I don't lose any message with V2 sdk but with V3 sdk I lose them when sns client is hanging and I forcefully trhow a timeout.
Thanks
Hi @RanVaknin,
I want to add another question after reading your response: in the V2 sdk what happens when the default requestTimeout is reached? An error is thrown or the promise is just resolved?
The timeout of 180 seconds I mentioned in my first message was not set on client-sns, but as external timeout to drop the process and retry, so in my actual implementation, after what you said, I think the connection to SNS topic still hangs even if I drop the process.
It still doesn't explain why V3 sdk has this slowdowns publishing messages to SNS topic, while the V2 sdk delivers them immediately, also under huge pressure, without missing any delivery.
Thanks
Hi @alesk20 , requestTimeout
means that the connection will terminated from the client side. It does not mean a retry.
Shouldn't also the messages handled with V2 sdk not being published if they reach the default 60 secs timeout?
Not necessarily, the server might receive and process your request but it might not be responding with the status to inform the client that the message was / wasn't processed.
It's hard to say why you are only experiencing this with v3. It might be because differences in connection management, or something you did differently in your code. Without seeing an end to end example it will be very difficult to root cause this.
Can you set up a minimal github repository that can reliably (intermittently reliably is also ok) reproduce this behavior? Ideally this reproduction would have the working v2, and the non working v3 code so we can compare these as well.
Thanks, Ran~
Hi @RanVaknin,
unfortunately it's very difficult to replicate this case, it only happens to me after 1-2 hours and only in production environment, where I have a lot of traffic on the sqs queue. I also tried to replicate it on a test environment myself, but couldn't manage to do it.
As I said in the first message, I didn't change anything on the code, I just migrate V2 sdk to V3 sdk and upgraded Node.js 16 to Node.js 18, these two are the only things I changed. I don't think the problem is Node.js 18 version.
Can you tell me what happens on V2 sdk when default requestTimeout is reached? The promise gets resolved or an error is thrown?
Thanks.
Hi @alesk20 ,
Can you tell me what happens on V2 sdk when default requestTimeout is reached? The promise gets resolved or an error is thrown?
When v2 requestTimeout (or in its v2 name timeout
) is reached, the client will kill the connection, and an error would be thrown as shown here: https://github.com/aws/aws-sdk-js/blob/36e3f6d5c27adf522b7517f095f060f4581d9b03/lib/http/node.js#L86. You might be handling it in v2 and not doing so in v3?
As I said in the first message, I didn't change anything on the code, I just migrate V2 sdk to V3 sdk and upgraded Node.js 16 to Node.js 18, these two are the only things I changed. I don't think the problem is Node.js 18 version.
I understand your concern, however I cannot point to a single point in the SDK and say "this is why your code is not working like it did in v2" There is about 8 years of development between when v2 was first introduced to when v3 was released, the architecture of the two is very different and evolved with the JS language itself and the Ecosystem's best practices.
I tried to strip down all of the http configurations used by the v2 SDK and actually have found that the only http option we explicitly override is indeed timeout
however I was wrong initially. We actually set it to 120000ms (2 min) by default:
console.log(sns.config.httpOptions)
// prints: { timeout: 120000 }
I don't think it will be helpful for us to keep comparing the two, and instead we should try and focus how to help with your current setup.
Are you running your application from something like a Docker container? I'm asking because Docker has decent support for tcpDump
which allows you to inspect TCP level networking events. You could use that, or any other network diagnostic tool to find what closes those connections.
I understand that your current repro code does not raise the reported behavior, but can you please share it anyway? Right now we are doing a lot of theorizing which is not helpful. By you sharing your code we can better visualize the architecture and do a simple visual check of certain things you might be missing to get this to work correctly (this is not to suggest that your code is wrong). If you have the v2 code handy, feel free to share that too.
Thanks again for your cooperation.
All the best, Ran~
This issue has not received a response in 1 week. If you still think there is a problem, please leave a comment to avoid the issue from automatically closing.
Hi @RanVaknin, with further investigations it seems that the problem resides on node 18 version, which is giving hanging http requests problem in other ways, not only on aws sdk. I will investigate more and try to release my project with node 20, which seems not to have these hanging problems.
Hi @RanVaknin, I think I found the problem and it's not with nodejs versions. The problem is with the S3 client of "@aws-sdk/client-s3": I managed to replicate the issue and I see that the sdk is never closing the socket opened with S3 requests and this eventually leads to a bottleneck in the server sockets pool. I think I solved the problem forcing the "requestTimeout" on the S3 client:
const s3 = new S3({ ...options.s3, requestHandler: new NodeHttpHandler({ httpAgent: new Agent({ keepAlive: true, keepAliveMsecs: 1000 }), requestTimeout: 5000 }) });
By doing this, I see that the S3 sockets are being closed after 5 seconds and no connection is hanging. Isn't this a sdk bug? With aws-sdk 2 the connections to S3 were successfully closed automatically after the response.
Kind regards.
Hi @alesk20 ,
I don't know the S3 operation you are using since it was not mentioned in the original issue description, but if I had to guess it's with the actual response from getObject. In v3 it returns a stream, and in NodeJS if you don't consume a stream the underlying connection might stay open.
This is covered here:
Because
keepAlive
is defaulted totrue
, if you acquire a streaming response, such as S3::getObject's Body field. You must read the stream to completion in order for the socket to close naturally.
Thanks, Ran~
Hi @RanVaknin, yes I publish and retrieve different objects to/from S3. When I use getObject operation I always consume the body like this:
const s3ObjectBody = await s3Object.Body.transformToByteArray();
Am I missing something?
Thanks.
EDIT: There was actually a point in the code where I was not consuming the Body stream. I fixed that, I'll let you know if the problem remains, but from my tests it seems to fix the issue, also removing the "requestTimeout" I put as a workaround.
Thank you again.
Kind regards.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.
Checkboxes for prior research
Describe the bug
Hello, I have a problem I can't solve with SNS client. I have a server that receive a big amount of messages from an SQS queue (using SQS client), performs some internal operation and then send a notification with a json message body to an SNS topic, using the sdk method "sns.send" and the argument as instance of the class PublishCommand.
After some hour the server is running, depending on the amount of the data flowing through the sqs consumer, the "sns.send" method begin to hang indefinitely and never respond, and the notification is not being published. I implemented a timeout of 180 seconds to stop the actual execution and retry the publication on the sns topic, and sometimes it works on the 2nd retry, sometimes on the 3rd and so on.
The problem is that as long as other messages are coming through the sqs queue, more and more messages start to have the same problem, until my server is completely blocked and needs to be restarted. After the restart the messages are succesfully elaborated and notifications are correctly published to the topic.
I have this problem only with aws-sdk v3, running aws-sdk v2 I never had this problem and the operations and logic of my server have remained the same. I tried different versions of the @aws-sdk/client-sns, included the last one, and the problem always occurs.
SDK version number
@aws-sdk/client-sns, @aws-sdk/sqs-consumer
Which JavaScript Runtime is this issue in?
Node.js
Details of the browser/Node.js/ReactNative version
Node.js 18
Reproduction Steps
const sns = new SNS({apiVersion: "2010-03-31", endpoint: options.endpointUrl}); const publishCommand = new PublishCommand({ ...MessageData, TopicArn: topic }); await sns.send(publishCommand);
Observed Behavior
The command "await sns.send(publishCommand)" hangs undefinitely
Expected Behavior
The "sns.send" command should respond immediately or at least after reasonable time.
Possible Solution
No response
Additional Information/Context
No response