Closed kubalunacek closed 10 months ago
Does this happen during the cold start after a some number invocations? A way to check if this happens during cold starts would be if it is the first invoke in the CloudWatch Log Stream.
Hey, it is not definitely cold start, via images. What is interesting that we have two Lambdas with same code base, the first one is 256MB of memory. The second one with 512MB of memory. The first always crash with Size: 256 MB Max Memory Used: 256 MB
. The second one with Size: 512 MB Max Memory Used: 368 MB
.
Lambda log:
I ran into the same issue as well. Also ARM64, but using 128MB for configuration. It's pretty rare as I've been doing benchmarking of various configuration and deployment options and have only seen it once.
One thing I noticed is that all invocations were short until the one that failed. And it's about the same duration as in the original report.
Hi @ashishdhingra , what do you need for reproduction? We have out production code deployed on AWS Lambdas. I am unable to provide you the production code. But we get this error once or four times a day. And from our observations, it's not so much the code as it is the number of calls at the moment. Because the other 10+ Lambdas with the same code base and less frequent calls don't suffer from anything like this.
We are experiencing the exact same issue. Our try/catch in the handler doesn't catch anything, and it crashes with the same Runtime error.
It happens randomly, on average once a day I would say.
.NET 6, ARM64 and 1024MB of configured memory.
And we observe the same thing as @kubalunacek 👇🏻
But we get this error once or four times a day. And from our observations, it's not so much the code as it is the number of calls at the moment. Because the other 10+ Lambdas with the same code base and less frequent calls don't suffer from anything like this.
Hey, we are having exactly the same issue. Few lambda calls per day fails with the error above. It is always the endpoints with the higher invocation number. Tried to debug and analyze this, but without any success.
We have the exact same issue on arm/.net6 lambdas
We switched same solution (code base, .NET6) to x86_64 architecture and all these errors disappeared. So problem is in the.NET6 ARM64 Lambda runtime. Do you have some progress @normj ?
This seems to be definitely related to https://github.com/dotnet/runtime/issues/72365.
@bjorg @kubalunacek @LarsAndreasEk or others, any chance you can produce & share a dump?
Actually, this could be caused by ValueTask
misuse in user code see https://github.com/dotnet/runtime/issues/70486#issuecomment-1151175694.
hi @antonfirsov I'm working on the same team as @LarsAndreasEk. I do not think we captured any dump related to this issue, but it did go away for us when we changed over to the AWS supported dotnet6 runtime and x86_64. Our code did not use any value tasks directly as far as I could find but under the hood this is of course in use.
Reporting the same issue, running an AWS Lambda function using ARM64 and .NET6. The crash has only occurred a single time, and appears to be (though it's difficult to say for sure) in the code that requests information from DynamoDb.
/// <summary>
/// Get the token from the cache. If it's expired, generate a new one first
/// </summary>
/// <param name="env">The targeted environment, e.g. prod or preprod</param>
/// <returns></returns>
public async Task<DbCacheItem> GetTokenAsync(string env)
{
DbCacheItem dbCacheItem;
DateTime expiryDateTime;
try
{
dbCacheItem = await _dynamoDbContext.LoadAsync<DbCacheItem>(env, _dBOperationConfig);
if (dbCacheItem == null)
{
expiryDateTime = DateTime.UtcNow.AddMinutes(-5);
}
else
{
expiryDateTime = DateTime.Parse(dbCacheItem.DateTimeOfExpiry);
}
if (expiryDateTime < DateTime.UtcNow)
{
Logger.LogInformation($"Existing token is null or has expired. Generating a new one");
var sapCloudService = new SapCloudPlatformIntegrationService().WithEnvironment(env);
var refreshedToken = await sapCloudService.GenerateTokenAsync();
dbCacheItem = ConvertToCacheItem(refreshedToken, env);
await SaveTokenAsync(dbCacheItem);
}
return dbCacheItem;
}
catch (Exception ex)
{
throw;
}
}
The exception appears to be unhandled:
2023-02-14T05:11:38.681Z c33d7e47-4d0b-40f2-9f28-804408778387 info
{
"cold_start": false,
"xray_trace_id": "1-63eb180a-79315a2f75da76e74d6273e2",
"client": "S2CProcessMessage",
"request_id": "No Request Id",
"function_name": "Service-SCPI-Authenticator",
"function_version": "$LATEST",
"function_memory_size": 2048,
"function_arn": "arn:aws:lambda:ap-southeast-2:xxxxxxx:function:Service-SCPI-Authenticator",
"function_request_id": "c33d7e47-4d0b-40f2-9f28-804408778387",
"timestamp": "2023-02-14T05:11:38.6812320Z",
"level": "Information",
"service": "service_undefined",
"name": "AWS.Lambda.Powertools.Logging.Logger",
"message": "Attempting to fetch access token"
}
2023-02-14T16:11:38.694+11:00 Unhandled exception. System.ArgumentOutOfRangeException: Specified argument was out of the range of valid values. (Parameter 'state')
2023-02-14T16:11:38.694+11:00 at System.Threading.ThreadPool.<>c.<.cctor>b__86_0(Object state)
2023-02-14T16:11:38.694+11:00 at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.InvokeContinuation(Action`1 continuation, Object state, Boolean forceAsync, Boolean requiresExecutionContextFlow)
2023-02-14T16:11:38.694+11:00 at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.OnCompleted(SocketAsyncEventArgs _)
2023-02-14T16:11:38.694+11:00 at System.Net.Sockets.SocketAsyncEventArgs.OnCompletedInternal()
2023-02-14T16:11:38.694+11:00 at System.Net.Sockets.SocketAsyncEventArgs.FinishOperationAsyncSuccess(Int32 bytesTransferred, SocketFlags flags)
2023-02-14T16:11:38.694+11:00 at System.Net.Sockets.SocketAsyncEventArgs.CompletionCallback(Int32 bytesTransferred, SocketFlags flags, SocketError socketError)
2023-02-14T16:11:38.694+11:00 at System.Net.Sockets.SocketAsyncEventArgs.TransferCompletionCallbackCore(Int32 bytesTransferred, Byte[] socketAddress, Int32 socketAddressSize, SocketFlags receivedFlags, SocketError socketError)
2023-02-14T16:11:38.694+11:00 at System.Net.Sockets.SocketAsyncContext.OperationQueue`1.ProcessAsyncOperation(TOperation op)
2023-02-14T16:11:38.694+11:00 at System.Net.Sockets.SocketAsyncContext.ReadOperation.System.Threading.IThreadPoolWorkItem.Execute()
2023-02-14T16:11:38.694+11:00 at System.Net.Sockets.SocketAsyncContext.HandleEvents(SocketEvents events)
2023-02-14T16:11:38.694+11:00 at System.Net.Sockets.SocketAsyncEngine.System.Threading.IThreadPoolWorkItem.Execute()
2023-02-14T16:11:38.694+11:00 at System.Threading.ThreadPoolWorkQueue.Dispatch()
2023-02-14T16:11:38.694+11:00 at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
2023-02-14T16:11:38.694+11:00 at System.Threading.Thread.StartCallback()
2023-02-14T16:11:38.875+11:00
Copy
RequestId: c33d7e47-4d0b-40f2-9f28-804408778387 Error: Runtime exited with error: signal: aborted
Runtime.ExitError
RequestId: c33d7e47-4d0b-40f2-9f28-804408778387 Error: Runtime exited with error: signal: aborted Runtime.ExitError
We have seen this error only once, but it is concerning. Should we look to change back to .NET 6 on x86 until this is understood?
Reporting the same issue, running an AWS Lambda function using ARM64 and .NET6. The crash has only occurred a single time, and appears to be (though it's difficult to say for sure) in the code that requests information from DynamoDb.
/// <summary> /// Get the token from the cache. If it's expired, generate a new one first /// </summary> /// <param name="env">The targeted environment, e.g. prod or preprod</param> /// <returns></returns> public async Task<DbCacheItem> GetTokenAsync(string env) { DbCacheItem dbCacheItem; DateTime expiryDateTime; try { dbCacheItem = await _dynamoDbContext.LoadAsync<DbCacheItem>(env, _dBOperationConfig); if (dbCacheItem == null) { expiryDateTime = DateTime.UtcNow.AddMinutes(-5); } else { expiryDateTime = DateTime.Parse(dbCacheItem.DateTimeOfExpiry); } if (expiryDateTime < DateTime.UtcNow) { Logger.LogInformation($"Existing token is null or has expired. Generating a new one"); var sapCloudService = new SapCloudPlatformIntegrationService().WithEnvironment(env); var refreshedToken = await sapCloudService.GenerateTokenAsync(); dbCacheItem = ConvertToCacheItem(refreshedToken, env); await SaveTokenAsync(dbCacheItem); } return dbCacheItem; } catch (Exception ex) { throw; } }
The exception appears to be unhandled:
2023-02-14T05:11:38.681Z c33d7e47-4d0b-40f2-9f28-804408778387 info { "cold_start": false, "xray_trace_id": "1-63eb180a-79315a2f75da76e74d6273e2", "client": "S2CProcessMessage", "request_id": "No Request Id", "function_name": "Service-SCPI-Authenticator", "function_version": "$LATEST", "function_memory_size": 2048, "function_arn": "arn:aws:lambda:ap-southeast-2:xxxxxxx:function:Service-SCPI-Authenticator", "function_request_id": "c33d7e47-4d0b-40f2-9f28-804408778387", "timestamp": "2023-02-14T05:11:38.6812320Z", "level": "Information", "service": "service_undefined", "name": "AWS.Lambda.Powertools.Logging.Logger", "message": "Attempting to fetch access token" }
2023-02-14T16:11:38.694+11:00 Unhandled exception. System.ArgumentOutOfRangeException: Specified argument was out of the range of valid values. (Parameter 'state') 2023-02-14T16:11:38.694+11:00 at System.Threading.ThreadPool.<>c.<.cctor>b__86_0(Object state) 2023-02-14T16:11:38.694+11:00 at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.InvokeContinuation(Action`1 continuation, Object state, Boolean forceAsync, Boolean requiresExecutionContextFlow) 2023-02-14T16:11:38.694+11:00 at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.OnCompleted(SocketAsyncEventArgs _) 2023-02-14T16:11:38.694+11:00 at System.Net.Sockets.SocketAsyncEventArgs.OnCompletedInternal() 2023-02-14T16:11:38.694+11:00 at System.Net.Sockets.SocketAsyncEventArgs.FinishOperationAsyncSuccess(Int32 bytesTransferred, SocketFlags flags) 2023-02-14T16:11:38.694+11:00 at System.Net.Sockets.SocketAsyncEventArgs.CompletionCallback(Int32 bytesTransferred, SocketFlags flags, SocketError socketError) 2023-02-14T16:11:38.694+11:00 at System.Net.Sockets.SocketAsyncEventArgs.TransferCompletionCallbackCore(Int32 bytesTransferred, Byte[] socketAddress, Int32 socketAddressSize, SocketFlags receivedFlags, SocketError socketError) 2023-02-14T16:11:38.694+11:00 at System.Net.Sockets.SocketAsyncContext.OperationQueue`1.ProcessAsyncOperation(TOperation op) 2023-02-14T16:11:38.694+11:00 at System.Net.Sockets.SocketAsyncContext.ReadOperation.System.Threading.IThreadPoolWorkItem.Execute() 2023-02-14T16:11:38.694+11:00 at System.Net.Sockets.SocketAsyncContext.HandleEvents(SocketEvents events) 2023-02-14T16:11:38.694+11:00 at System.Net.Sockets.SocketAsyncEngine.System.Threading.IThreadPoolWorkItem.Execute() 2023-02-14T16:11:38.694+11:00 at System.Threading.ThreadPoolWorkQueue.Dispatch() 2023-02-14T16:11:38.694+11:00 at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() 2023-02-14T16:11:38.694+11:00 at System.Threading.Thread.StartCallback() 2023-02-14T16:11:38.875+11:00 Copy RequestId: c33d7e47-4d0b-40f2-9f28-804408778387 Error: Runtime exited with error: signal: aborted Runtime.ExitError RequestId: c33d7e47-4d0b-40f2-9f28-804408778387 Error: Runtime exited with error: signal: aborted Runtime.ExitError
We have seen this error only once, but it is concerning. Should we look to change back to .NET 6 on x86 until this is understood?
Just an FYI we have had more occurrences of this issue. Very infrequent, but still concerning.
Note that that a bug has been identified in .NET that may be potentially causing this. It will be fixed in upcoming releases 6.0.18
and 7.0.7
(in June if I'm not mistaken):
https://github.com/dotnet/runtime/pull/84432
https://github.com/dotnet/runtime/pull/84641
@kknd4eva and others, please stay tuned for the update and let us know if it helped!
This is still occurring. Should we have expected this to be resolved by now? (Once again only impacting .NET6 + Arm64
`CloudWatch Logs Insights
region: ap-southeast-2
log-group-names: /aws/lambda/SOE-Order-File-Service
start-time: -43200s
end-time: 0s
query-string:
fields @timestamp, @logStream, @log, @message
| filter @message like 'Specified argument was out of the range of valid values. (Parameter'
@timestamp | @logStream | @log | @message |
---|---|---|---|
2023-07-30 06:49:39.317 | 2023/07/30/[$LATEST]2f985c3216d7442981641381a86663f8 | 298552055425:/aws/lambda/SOE-Order-File-Service | Unhandled exception. System.ArgumentOutOfRangeException: Specified argument was out of the range of valid values. (Parameter 'state') |
---`
Are you 100% sure your code is running on the updated (6.0.18+) runtime? What is your exact dotnet runtime version?
Are you 100% sure your code is running on the updated (6.0.18+) runtime? What is your exact dotnet runtime version?
I believe so. The Lambda runtime update is set to 'AUTO' and the function runtime is currently:
INIT_START Runtime Version: dotnet.6v17 Runtime Version ARN: arn:aws:lambda:ap-southeast-2::runtime:4b286ddbfcfeaf773c960ecafbd89d987b45ebf710c123baf0640c1fd8798259 --Ah, I see - the version says v17. I would have thought an AUTO configuration would have automatically updated. I'll investigate how I can force this update.
@kknd4eva could you please check if the issue has been resolved in the latest Lambda .NET patches? I believe they are on dotnet.6v19 now. I am trying to reproduce the issue locally, but I haven't been able to so far.
This issue has not received a response in 5 days. If you want to keep this issue open, please just leave a comment below and auto-close will be canceled.
Describe the bug
We have an application that runs on AWS Lambda. Until recently we were running it on .NET Core 3.1 and x86 architecture. We have now ported it to .NET6 on ARM64 and this has brought unexpected problems, as we are randomly getting errors that we believe point to a problem in AWS Lambda runtime for .NET 6. On the original application (.NET Core 3.1 + x86) these errors were not present. Does anyone have any experience with the migration to ARM and .NET6 and if it is more memory intensive or could it be a bug in the Lambda runtime? Any ideas on how to solve the problem will be gratefully received. Thanks.
Exception;
Expected Behavior
Lambda works without errors.
Current Behavior
Lambda fails unexpectedly.
Reproduction Steps
We have no trivial code for that, but we have global TryCatch block in our Lambda handler and it does not catch anything.
Possible Solution
No response
Additional Information/Context
No response
AWS .NET SDK and/or Package version used
Amazon.Lambda.APIGatewayEvents 2.4.1 Amazon.Lambda.Core 2.1.0 Amazon.Lambda.Serialization.SystemTextJson 2.3.0 AWSXRayRecorder.Handlers.AwsSdk 2.8.3
Targeted .NET Platform
.NET 6
Operating System and version
AWS Lambda