aws / aws-cli

Universal Command Line Interface for Amazon Web Services
Other
15.12k stars 4.02k forks source link

aws logs filter-log-events: Error during pagination: The same next token was received twice #3191

Closed dflantz closed 3 years ago

dflantz commented 6 years ago

Hi! Here are the details:

$ aws --version
aws-cli/1.14.50 Python/3.6.4 Darwin/16.7.0 botocore/1.9.3
$ aws logs filter-log-events --log-group-name=<log-group-name>   \
--start-time=1518998400000 --end-time=1519084800000   \
--filter-pattern='<filter-pattern>'  --query "events" > logs.json

Error during pagination: The same next token was received twice: {'nextToken': '<token>'}

I believe this may be related to https://github.com/aws/aws-cli/issues/1485.

joguSD commented 6 years ago

Is this an intermittent issue or does this happen all of the time? If it's broken fairly consistently we may have to remove this paginator.

starkshaw commented 6 years ago

My customer has the same issue. Let me ask him to try to run it multiple times.

dasmowenator commented 6 years ago

It happens to me quite regularly when using the AWS CLI to search for CloudWatch Logs, but it's non-deterministic. When I get this error from my scripts, I just re-run them and it's a 50-50 chance whether it'll happen again or not.

dasmowenator commented 6 years ago

I actually do think it's reproducible, but I'm not 100% sure how. Generally, if I run a "aws logs filter-log-events" request against a very large log group (like an API Gateway execution log group) which takes a long time, if I try to run any other requests against the same log group at the same time, I get the error. Or if I Ctrl+C my original request while it's running and then retry the command I get the same error.

dasmowenator commented 6 years ago

So I responded... and even included steps to reproduce the issue... why is it still marked as "closing soon if no response"?

tyvich commented 6 years ago

Also have recently encountered this issue with a log group containing large items, but not necessarily retrieving a high volume of items (~5000). Was able to remedy by providing pagination config parameter, PageSize, of seemingly any arbitrary value.

Was not able to reproduce by iterating through filter_log_events without paginator and looking for duplicate tokens.

JustinTArthur commented 6 years ago

Seeing this on boto3/botocore as well on a high volume log retrieval. Has something changed recently on the underlying APIs?

Traceback (most recent call last):
  File "/Users/jarthur/Repos/Lifesize.misc/jarthur/Reports/process_cloudwatch_vpss.py", line 51, in <module>
    for page in response_iterator:
  File "/Users/jarthur/.virtualenvs/misc/lib/python3.5/site-packages/botocore/paginate.py", line 301, in __iter__
    raise PaginationError(message=message)
botocore.exceptions.PaginationError: Error during pagination: The same next token was received twice: {'nextToken': 'Bxkq6kVGFtq2y_Moigeqsd0gU4kG897U2M9-6GKlGbZoYxBDfCbY…
JustinTArthur commented 6 years ago

I just got word from AWS that they are aware of the issue and working on a fix. @joguSD it seems this issue is erroneously labeled as awaiting a response.

dmuzzy commented 6 years ago

Been getting this error for about a week now, and no pagination settings seem to get around it. On Windows 10, I just updated my awscli install from 1.14.66 to 1.15.6, but that didn't help.

@JustinTArthur Any word from AWS on this?

JustinTArthur commented 6 years ago

I queried for workarounds, but haven't heard back. As soon as I hear about the fix or a workaround, I'll follow up here.

dmuzzy commented 6 years ago

Cool, thanks.

margeson commented 6 years ago

Issue is happening in version 1.15.13.

AWS CLI was installed using pip

Seems to choke when you are pulling logs greater than a month

$ aws logs filter-log-events --log-group-name flowlogs --log-stream-names eni-11111-all --filter-pattern "[version="2", account_id, interface_id, srcaddr, dstaddr, srcport, dstport, protocol, packets, bytes, start, end, action="ACCEPT", log_status="OK"]" --start-time 1522782087000

Error during pagination: The same next token was received twice: {u'nextToken': u'zpUlahi1iz0r54spQh9r4Squ0R9zaukaZh7g5OILz5dLsJokO6t1ZmDdPShgBRpvAD3lTwSePeDHB3cJs6kiIgBY0ueYP6ydqnHIzvRNdN8joRobMpKIqc_sYf425QfL8qNuDlUxFBq_Zmt9eIEmCPzFJMIWtfZYI2F5jEgyx0Pc85ut9luVlCB6-iSKyV9CWOzNeltbrkn-T7hwO-BhKC5jNPLsKgbwPmgE02Qa-ozdgIvx2mo1ccGoXWodcYEP'} $ aws --version aws-cli/1.15.13 Python/2.7.12 Linux/4.10.0-38-generic botocore/1.10.13

JustinTArthur commented 6 years ago

While I haven't heard official word back from AWS yet, this started working again for me in the us-west-1 region as of 48 hours ago using botocore stand-alone. For those of you experiencing it, let me know if it's fixed for you, whether you're using aws-cli or a boto lib.

dasmowenator commented 6 years ago

I'm still experiencing this error.

mortalius commented 6 years ago

Still getting this error quite regularly in Lambda when doing get_paginator('filter_log_events'). Got this error with current default boto3, botocore versions used by Lambda and also tried with latest versions boto3==1.7.26, botocore==1.10.26. Same.

replogle commented 6 years ago

Also getting this error in the CLI: aws-cli/1.14.7 Python/2.7.5 Linux/3.10.0-327.el7.x86_64 botocore/1.8.11

For me the error is very repeatable when asking for matches in a log group w/o specifying a log stream. The log group I'm querying has many streams; filter-pattern is a simple string ("error") and the time window is only a few seconds.

kyleknap commented 6 years ago

Just went over the related issue. We are going to investigate this a bit more and see what are options are here as it appears multiple people are running into it.

dasmowenator commented 6 years ago

Well that's good... but I have a question for the CloudWatch Logs team -- why is sending the same page token twice an error in the first place? When I send the same page token multiple times to DynamoDB or Elastic Transcoder it just returns the same page as it did the first time (assuming the page token was valid and hasn't expired). So why is CloudWatch different?

joguSD commented 6 years ago

The duplicate token received error is on the client side in botocore in our automatic pagination feature. This error is raised when we get the same token back twice in a row and is there to avoid infinite loops where the service just returns the same next token over and over.

JustinTArthur commented 6 years ago

I've been informed that the CloudWatch Logs API has been fixed.

dasmowenator commented 6 years ago

I thought @joguSD said that this was an issue with botocore, not with the CW Logs API?

dasmowenator commented 6 years ago

Also, I just ran a bunch of FilterLogEvents requests and got this error again, so it hasn't been fixed.

sirosen commented 6 years ago

Per boto/boto3#1584 , I found that manual pagination in which you allow the same token to be returned multiple times in a row works. In my case, it doesn't appear to return any duplicate records -- those requests, in fact, contain streams with no records matching my filter at all.

I don't know if that's generalizable or not -- you'd need to use boto3, not awscli, to check.


EDIT: for easy reference, I'm adding the working boto3 code for manual pagination below. boto/boto3#1584 has more context and detail.

logclient = boto3.client('logs')
kwargs = {}
while True:
    response = logclient.filter_log_events(
        logGroupName=group,
        startTime=monitor_start_window,
        filterPattern=monitor_filter,
        **kwargs)
    """ do fancy stuff """
    if 'nextToken' not in response:
        break
    kwargs['nextToken'] = response['nextToken']
dhamaniasad commented 5 years ago

For future reference, one way to work around this is to set an arbitrarily high --page-size.

HyperDevil commented 5 years ago

Hi, the --page-size does not work in my case on #3917

sirosen commented 3 years ago

Was this ever resolved? Was this fixed upstream, in the CloudWatch Logs service? Did awscli (v1 or v2) make any changes that would make the treatment of the pagination markers more lax?

I haven't been hearing about this from our team, but it may be because we switched our log processing scripts to avoid awscli for this purpose.

huksley commented 3 years ago

Happens with API too, I get fairly consistently the same token when I call filterLogEvents and there is no events returned.

let token = null;
const logs = new AWS.CloudWatchLogs({ apiVersion: "2014-03-28", region });
do {
  const response = await logs.filterLogEvents({
    logGroupName,
    logStreamNames,
    startTime,
    filterPattern
  }).promise();

  if (token === response.nextToken) {
    console.warn("Got the same token, stopping search");
    token = null;
  } else {
    token = response.nextToken;
  }
  events = [...events, ...response.events];
} while (token);