Closed nicolas-laduguie closed 6 months ago
I don't think your expectation is achievable. There comes a point in shutdown where it is not possible for native code to call back into the JVM and we do not know of anything more accurate (timepoint-wise) than the JVM shutdown hook. If you have a different understanding, then there might be something we could do.
One lesser possibility that comes to mind is potentially a registry of incomplete CRT futures that could at least be failed-en-masse by the CRT's JVM shutdown hook, which would at least unblock waiters.
IMHO, this is too early during an application shutdown to bind to a shutdown hook, because modern applications are doing/can do things still during the shutdown. Taking concrete example :
It can even be the case where request received (upload not initiated yet) -> shutdown received -> upload initiated -> CRT resources no more available -> failure
A solution would be to give developers the choice to use the CRT defined shutdown hook, or disable the hook via a property, and expose a stop method that the developer would need to call during the shutdown. That would be the easiest way to me.
That might be a reasonable compromise.
One lesser possibility that comes to mind is potentially a registry of incomplete CRT futures that could at least be failed-en-masse by the CRT's JVM shutdown hook, which would at least unblock waiters.
Is not it possible for them to be failed explicitly by closing the HTTP-client?
Revisiting this, we've thrown around a couple of ideas that might help here; I may take a stab at them shortly:
Thank you for feedback. The solution 2 is enough for us because in general with Java and spring boot ecosystem, it is easy to control all the shutdown flow, and it would be easy to call the CRT shutdown manually at the right time. Would you have a timeline for this fix?
Don't have a timeline but I can probably squeeze it in over the next week. A cursory google search just demonstrated that proposal (1) is a bad idea as it indicates there's no real fifo order (and they're concurrent too) to shutdown hook registration vs. execution.
I'm wondering if ref-counting may make sense here. So every call to disable_crt_shutdown() requires a corresponding call to shutdown_crt() and then the call that makes the ref count zero (almost always the first one, but supports complex cases where multiple dependencies might be trying to control the shutdown time of the CRT. In that case, the last user shutdown hook would "win" and the CRT would shutdown after its invocation, which would be the desired behavior).
Here's a quick sketch (ie no testing yet):
https://github.com/awslabs/aws-crt-java/pull/672
The basic contract is that anyone who wants to delay/control the CRT shutdown calls enableManualShutdown
and when it's safe (from their perspective) to shutdown the CRT, they call manualShutdown
. When the last matching call to manualShutdown
is invoked, then the CRT actually gets shutdown.
@bretambrose Do you have an update on this issue? We are in a very similar situation as @nicolas-laduguie
I might be able to squeeze this into next oncall (next week).
Would the proposed solution PR work for you?
Yes
This was addressed a while back by https://github.com/awslabs/aws-crt-java/pull/672 and released in v0.29.7
Describe the bug We are using AWS Java CRT in the context of AWS sdk V2 with transfer manager for files uploads. When our Spring boot application is shutdown gracefully, if a multipart upload is started through TransferManager (developer preview) during the graceful shutdown, the transfer is initiated but gets stuck.
Expected Behavior
A multipart upload through TransferManager should finish successfully even though it's started during a graceful shutdown.
Current Behavior
The multipart upload starts as we can see logs like :
But then nothing happens, so seems like the multipart upload is stuck.
On CRT side, we can see error logs like :
That show CRT has detected beginning of JVM destroy, but not managing graceful shutdown.
Reproduction Steps
Spring boot 2.5.12 AWS SDK 2.18.3 s3-transfer-manager 2.18.3-PREVIEW
Possible Solution
No response
Additional Information/Context
No response
AWS Java SDK version used
2.18.3
JDK version used
17
Operating System and version
Docker Linux Ubuntu Focal 20