Open pabloiarriola opened 1 year ago
Hi @pabloiarriola , can you confirm which version of SDK5 you are using? Depending on your host operating system's package manager, you can use:
Yum
yum search -v cloudhsm
Apt
apt search cloudhsm
This will give you the major and minor version.
Hi @pabloiarriola , can you confirm which version of SDK5 you are using? Depending on your host operating system's package manager, you can use:
Yum
yum search -v cloudhsm
Apt
apt search cloudhsm
This will give you the major and minor version.
we are using it as a layer for our lambda, the version we are packaging is the cloudhsm-jce-5.7.0.jar with log4j-api-2.17.1.jar and log4j-core-2.17.1.jar
Thanks @pabloiarriola . Can you upgrade to the latest version of the JCE? The version is 5.8.0. This version has some updates which address the warm start issue you are experiencing.
hi @rday we updated to 5.9 and we are still getting "message": "The underlying Provider connection was lost: Communication with the device was lost during the execution of the function.", "name": "com.amazonaws.cloudhsm.jce.jni.exception.ProviderException",
@pabloiarriola Sorry to hear, in this case we would need to collect more information to investigate. We recommend working with your Account Manager to open a support case. They can collect the necessary information and support can investigate your situation.
@rday thank you, just to verify this issue was addressed on version 5.8.0 correct?
We are getting the same error and we are using the 5.8.0
Hi @guillomep , you can try upgrading to the latest release at this time, 5.10, or you can try reaching out to your TAM to collect more information about your specific environment. We would need to see logs of around operations, keep alives, and when the connections were dropped.
Hi all,
We also are using CloudHSM provider (client 5.11.0) in a Java-based AWS Lambda. We configured for our initial tests implicit login (using ENV variables)
We can successfully interact with HSM if we continuously invoke the Lambda. Instead, when we stop invoking it for about 30 seconds we get this kind of log messages: 2024-01-18T16:19:53.186Z WARN [8] ThreadId(2) [cloudhsm_provider_common::keep_alive] CC000: Maximum keep-alive attempts have been reached for 10.4.1.27. Stopping keep-alive task. 2024-01-18T16:19:53.186Z INFO [8] ThreadId(2) [cloudhsm_provider_common::dispatcher] Exiting all active dispatcher operations 2024-01-18T16:19:53.186Z INFO [8] ThreadId(2) [cloudhsm_provider_common::dispatcher] Exiting all active dispatcher operations 2024-01-18T16:19:53.187Z ERROR [8] ThreadId(1) [cloudhsm_provider::hsm1::hsm_connection::error] Disconnected with server. Message: Tls disconnected. Reason: Send Failed. Dispatcher is now disconnected. 2024-01-18T16:19:53.187Z ERROR [8] ThreadId(1) [cloudhsm_provider_common::keep_alive] Keep-alive failed for 10.4.1.27. Internal Error: Internal error occurred. Error: HSM is disconnected 2024-01-18T16:19:53.346Z WARN [8] ThreadId(3) [cloudhsm_utils::retry] Receive error: Connection retry attempts on HSM failed. For Operation get_hsm_connection. Going to retry. Attempts 0/3 2024-01-18T16:19:53.865Z WARN [8] ThreadId(3) [cloudhsm_utils::retry] Receive error: Connection retry attempts on HSM failed. For Operation get_hsm_connection. Going to retry. Attempts 1/3
The client performs some connection retries, then establishes the connection and works properly. Unfortunately, when this issue occurs, the processing takes x10 compared to a normal execution (about 300ms vs 3500ms). The same code works normally in a spring boot application deployed on an EC2 instance.
Is there any solution to get the lambda to work properly in any invocation?
Thank you!
Hey @Sabo-kun did you find a solution?
@rday we are now having a problem in that it seems the connection is not even being started. We are using a layer and it seems like it never starts, as we dont get any error messages or anything. The lambda just times out. We are using version 5.9
Still having no solution on 5.10.0 we are still having the problem.
Also we notice that sometimes we also get the following message
Unexpected error with the Provider: E2e failed to process the HSM response. Failed to decrypt using e2e.
@pabloiarriola and @Sabo-kun , it sounds like the workflows are somewhat sporadic. Lambda will freeze your execution environment after processing has stopped. Depending on how long it takes for the next invocation, Lambda will "warm start" or "cold start" your code. The timeout for a warm start and a cold start are not defined.
What this means for CloudHSM is that our dataplane is not able to communicate with your client after the invocation has been frozen. If your lambda is "warm started", it is possible that our dataplane has timed out, but the client thinks the connection is still alive. This is a probable cause of the x10 processing time. During a cold start, everything is built from scratch. The client will establish all the connections, and things work much faster.
While this warm start delay is something we are aware of, we are still working on the right way to address the problem. Any data we could collect would be great. We can also work with you on your specific situation, but that would have to be done through Customer Support.
I'll update this issue as we make progress. Thanks for continuing to report!
We are using the latest CloudHSM sdk 5 for java. We are creating the connection to it using AWS lambdas and the sign in and out is being done with the same code as the code examples that are in this repo. We are getting a problem when using a warm environment, as when trying to do the sign in it gives us the following error:
"message": "The underlying Provider connection was lost: Underlying connection to provider was lost", "name": "com.amazonaws.cloudhsm.jce.jni.exception.ProviderException",
When its a cold environment when dont have this problem as it does sign in and out, but when reusing it we get this exception. We are using loginWithExplicitCredentials