Closed PaulColeman closed 3 years ago
Would it be possible to narrow down the the time window when these happen and provide the account id used?
Thanks Norm. On one occasion it happened between 2018-03-15 19:56:00 and 2018-03-15 20:13:00 UTC. I think it happen all clustered within seconds but I'm not exactly sure where in that range it happened.
688458520130 is the id.
Any update on this? Thanks for investigating.
On Sat, Mar 17, 2018, 4:28 AM Paul Coleman paul.coleman@gmail.com wrote:
Thanks Norm. It happened between 2018-03-15 19:56:00 and 2018-03-15 20:07:00 UTC. I think it happen all clustered within a few seconds but I'm not exactly sure where in that range it happened.
The account id is 688458520130
On Sat, Mar 17, 2018 at 7:04 AM Norm Johanson notifications@github.com wrote:
Would it be possible to narrow down the the time window when these happen and provide the account id used?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/aws/aws-lambda-dotnet/issues/245#issuecomment-373899687, or mute the thread https://github.com/notifications/unsubscribe-auth/ABBxJcf-UXUu8GvymDPukY2NTvsoB7EDks5tfLWEgaJpZM4Stiw2 .
Any updates on this issue? I'm also struggling with exactly the same issue. Invocation error reported, but nothing in the CloudWatch logs.
@PaulColeman and @paul-zah Can you provide minimal code repros for your functions that are dying? If you're able to isolate the code that's causing the issue it would help a lot.
I'm experiencing the same issue, load testing my function, I have an error count quite high, but cannot find any exceptions in my Lambda logs.
Just a thought - It might be worth trying to move the logging up to the LambdaEntryPoint (i.e. Program.cs) to handle any errors that might be thrown during bootstrapping the WebHost. I do this with serilog in non-lambda services following their guidance here (try / catch logging around program.cs): https://github.com/serilog/serilog-aspnetcore/blob/dev/samples/SimpleWebSample/Program.cs#L13
Noticing the same issue. All invocations from CloudWatch failing which causes the a spike in the invocation errors monitoring tab but errors do not appear in logs.
I am seeing a similar issue where the invocation count is increasing infrequently and the cloudwatch logs don't have any error logs. Is this root caused?
I am facing the same issue. There are logs which does not show any error and seems working as expected. But I can see the Cloudwatch alarm for errors triggering up when the lambda is invoked.
Hi Guys. I am facing the same issue. Any updates on this?
This may be an issue with Lambda itself - I've been noticing this behavior for months now. I dont use dotnet, instead use node and aws-sdk.
You will see errors in the lambda Monitoring dashboard and clicking through time range logs you will see no trace of the error. This in my opinion is one of the internal lambda "quirks", similar to idempotency issues in aws lambda (where you cant guarantee your function will run exactly once... it can run multiple times, seconds/mins apart even when there is no error detected) - like the idempotency issue you will need to do some defensive coding in your app to account for internal errors, make sure you do proper error handling in your code and catch/throw errors with proper log tracing.
If you then see "internal errors" that seem to happen outside your error handling you should be able to discount them as anomalies or false positives as you are confident in your error handling coverage. (not ideal but one of the quirks of serverless computing, the issue is one someone else server :)
I am seeing the same thing - on node 10 lambdas. The alarms are triggered on errors crossing a threshold, but there are no errors in the logs. I end up wasting quite a lot of time checking false positives, and I can't see a reason why this could be the desired behaviour, so would be great if the lambda team could improve this area.
Happened to me as well... using python. My lambda is triggered every second, so up until now I was sure I cannot find it because I have so many logs and I am not looking for the right filter... never thought that there are simply no logs... However, I dont think it is random. It usually happens when there are problems in the DB...
This happened to me as well. I have java sdk lambdas in 2 separate regions and both of them generated error metrics from 6:45 - 7:10 AM CDT but there are no ERROR logs in cloudwatch.
Other stuff to look at:
This was happening to me as well in python 3.8.
Use the following query in Log Insights: fields @timestamp, @message | filter @message like "Process exited before completing request" | sort @timestamp asc | limit 20
It might be a memory problem causing the error. A timeout can also cause an error in lambda and you have to used a different query to find it.
Hi @PaulColeman,
Good morning.
I was going through the issue backlog and came across this guidance question. Please let me know if this is still an issue or else if this could be closed.
Thanks, Ashish
This issue has not recieved a response in 2 weeks. If you want to keep this issue open, please just leave a comment below and auto-close will be canceled.
still an issue...
Hi @PaulColeman @aya-givati,
Please have a look at the article How do I troubleshoot Lambda function failures? and let me know if it helps.
Thanks, Ashish
Hi @ashishdhingra, Thank you fir your response. Unfortunately it did NOT help me. my problem is that my "Error" metric Alert is on and I cannt find the lines in the log that explain why
@aya-givati I'm not sure what to recommend here since the invocation errors occur outside of .NET SDK. As explained in the documentation link I shared, for any code related errors, CloudWatch is the option. However for invocation errors, Cloudtrail could be the option. I would suggest to contact CloudWatch support for more details for troubleshooting. I will try to see if I could find any guidance, but this doesn't appears to be the .NET SDK issue.
I do see that you are using Python SDK. So this issue appears to be service specific, not a specific SDK issue. Were you able to get guidance from Python SDK team which might be helpful?
This issue has not recieved a response in 2 weeks. If you want to keep this issue open, please just leave a comment below and auto-close will be canceled.
Getting the same errors! Also spent a lot of time trying to understand.
Experincing the same. Added try-catch mechanisms with logging in my services. None of the logging is found on cloudwatch logs even though cloudwatch error alerts on the lambda are firing off.
I'm having the same problem
Having the same issue on Node 14. Nothing in the logs at all, yet alarms get tripped.
Same issue with Python. All red in metrics graph, standard logging enabled but absolutely nothing in CW logs.
Experiencing the same issue with Python, and also wasting lots of time
Try Querying the log insights with "Task timed out" Phrase
fields @timestamp, @message | filter @message like "Task timed out" | sort @timestamp asc | limit 20
For me the problem was the policy attached to the Lambda. I made a custom policy and the log-group
ARN was not right. Fixing that, fixed the problem - running a Python lambda as well. To check it using the AWS Console, go to configurations > permissions
, and check if the role has appropriate policy.
Try Querying the log insights with "Task timed out" Phrase
fields @timestamp, @message | filter @message like "Task timed out" | sort @timestamp asc | limit 20
thank you! this has saved me a lot of head banging.
Try Querying the log insights with "Task timed out" Phrase
fields @timestamp, @message | filter @message like "Task timed out" | sort @timestamp asc | limit 20
saved my life! thanks so much
I have the same issue when I want to trigger my Lambda with a file upload to S3. I don't see any invocation but the error rate increases. I don't see any task timed out logs either, for me there are no logs at all. It happens only in one environment - the exact same function in another environment works fine.
Does anyone have ideas how to debug invocation errors that don't appear in CloudWatch logs?
I am seeing cases where a lambda seemingly randomly will fail to invoke 1 to 7 times, incrementing the CloudWatch lambda error count, but no invocation (START, END, or REPORT) appears in the CloudWatch logs for the lambda. Nothing appears in the deadletter queue either.
I have 40ish similar lambdas and 8 of them had this same behavior at very similar times. These failures happen very infrequently, but when I do see them it is always in a similar pattern: multiple lambdas, cloudwatch error counters > 0, nothing in the cloudwatch logs.
I don't think this is a permissions issue with the lambda's ability to write to the logs as it will invoke correctly and START, END, REPORT etc do appear in the CloudWatch logs.
I assume it must be some issue setting up the environment -- the stuff that happens before invoke. How can I get to the bottom of this?
These lambdas are all .net core 1.0.