aws / aws-sdk-net

The official AWS SDK for .NET. For more information on the AWS SDK for .NET, see our web site:
http://aws.amazon.com/sdkfornet/
Apache License 2.0
2.06k stars 854 forks source link

CloudWatchLogs cannot query realtime latest log #1387

Closed jasonycw closed 3 years ago

jasonycw commented 5 years ago

CloudWatchLogs query cannot get the latest log that are just being created

var query = await cloudWatchLogs.StartQueryAsync(
    new StartQueryRequest
    {
        LogGroupName = _logGroup,
        StartTime = (long) (DateTime.UtcNow.AddMinutes(-5) - new DateTime(1970, 1, 1)).TotalSeconds,
        EndTime = (long) (DateTime.UtcNow.AddMinutes(5) - new DateTime(1970, 1, 1)).TotalSeconds,
        QueryString = "filter Level=\"Error\""
    }, token);
var request = new GetQueryResultsRequest
    {
        QueryId = query.QueryId
    };
GetQueryResultsResponse results;
do
{
    results = await cloudWatchLogs.GetQueryResultsAsync(request, token);
} while (results.Status == QueryStatus.Running ||
         results.Status == QueryStatus.Scheduled);
return results;

Expected Behavior

Completed results suppose to have the latest logs in the pass 5 minutes.

Current Behavior

There is no log being returned after the result status is completed.

Possible Solution

StartQueryRequest should allow null EndTime so it will automatically query from StartTime to now.

Steps to Reproduce (for bugs)

Trigger some logs, then immediately use StartQueryAsync and GetQueryResultsAsync to get the latest log.

Context

I am trying to create a log alert system using SNS and Lambda. CloudWatch filter will trigger an alert that will executed action on the SNS The SNS will trigger the Lambda to query the latest log and send an email notification

However, the latest log cannot be query properly. e.g.

  1. The logs that trigger the alert is created at 17:17:45
  2. Alarm history shows Successfully executed action arn:aws:sns:... at 17:17:55
  3. Lambda is triggered by SNS and StartQueryAsync at 17:17:59
  4. GetQueryResultsAsync get the completed result at 17:18:04 with no log result

*the times are based on the CloudWatch

Your Environment

klaytaybai commented 5 years ago

Hi @jasonycw, I'll respond to this more later, but I want to point out a problem in your code while I have time so you hopefully don't encounter a never-ending loop. Valid values for the query's status are: Scheduled | Running | Complete | Failed | Cancelled. Complete, Failed, and Cancelled are all terminated states. Should your query fail or be cancelled, the loop won't terminate as written. You might also want to only call the GetQueryResultsAsync API occasionally so as to not consume too many resources.

jasonycw commented 5 years ago

Thanks for your tips, I have updated the do while loop. The point is I want instant query on the latest log once the lambda is trigger, that's why I loop GetQueryResultsAsync right after StartQueryAsync. I understand the two methods are not suppose to be called this way, but I don't see how lambda can get query in an "occasional asynchronous" way

I also found that manually adding a 2+ minutes delay to start query after the log is created will yield a more consistence query result but still most of the time, query nothing.

jasonycw commented 5 years ago

Hi @klaytaybai, any update on why this happen?

I tried adding 2 minutes delay before querying the latest log and that could yield around 70% success rate of getting the latest logs. However, if I search the log group directly on CloudWatch web console, I could query the latest log within a few seconds. Not sure why the query using code cannot get those same latest logs.

github-actions[bot] commented 4 years ago

We have noticed this issue has not recieved attention in a year. We will close this issue for now. If you think this is in error, please feel free to comment and reopen the issue.

jasonycw commented 4 years ago

Hi @klaytaybai, any update on why this happen?

ashishdhingra commented 4 years ago

Hi @jasonycw,

I was going through the issue backlog and came across this issue. Using the following code, I'm unable to reproduce the issue. I test the Lambda function via AWS Explorer VS extension and immediately executing the below code retrieved the new logs from CloudWatch:

using System;
using Amazon;
using Amazon.CloudWatchLogs;
using Amazon.CloudWatchLogs.Model;

namespace CloudWatchQueryTest
{
    class Program
    {
        static void Main(string[] args)
        {
            string _logGroup = "/aws/lambda/LambdaTest";
            using (AmazonCloudWatchLogsClient client = new AmazonCloudWatchLogsClient(RegionEndpoint.USWest2))
            {

                var query = client.StartQueryAsync(
                    new StartQueryRequest
                    {
                        LogGroupName = _logGroup,
                        StartTime = 1577850562, //1 Jan 2020
                        EndTime = (int)(DateTime.UtcNow - new DateTime(1970, 1, 1)).TotalSeconds,
                        QueryString = "fields @timestamp, @message | sort @timestamp desc"
                    }
                ).Result;

                var request = new GetQueryResultsRequest
                {
                    QueryId = query.QueryId
                };

                GetQueryResultsResponse results;
                do
                {
                    results = client.GetQueryResultsAsync(request).Result;
                } while (results.Status == QueryStatus.Running ||
                         results.Status == QueryStatus.Scheduled);

                Console.WriteLine("Total Results: " + results.Results.Count);
            }

            Console.ReadLine();
        }
    }
}

It could be a temporary delay until logs are available to all the endpoints.

Thanks, Ashish

jasonycw commented 4 years ago

Hi @ashishdhingra

I had moved on from query log from lambda to use CloudWatch event trigger lambda, so I do not have any code in hand that need to query CloudWatch immediately after the log occur.

But if there is still delay between latest log and query of log, then the bug still exist. Not sure if the latest 3.5.X version of CloudWatchLogs package will improve the situation

ashishdhingra commented 4 years ago

@jasonycw I will have a look at the the .NET SDK CouldWatchLogs API and let you know the findings. Most probably, the SDK is only a wrapper around Service API (just need to confirm), in that case it's not a bug in SDK, but a lag from service API. I will keep you posted.

ashishdhingra commented 4 years ago

@jasonycw From what I see in the code for AmazonCloudWatchLogsClient, all the operations are just service API wrappers. So it appears to be a lag from the service level. Hope this answers your question.

github-actions[bot] commented 4 years ago

This issue has not recieved a response in 2 weeks. If you want to keep this issue open, please just leave a comment below and auto-close will be canceled.