aws / aws-sdk-go

AWS SDK for the Go programming language (In Maintenance Mode, End-of-Life on 07/31/2025). The AWS SDK for Go v2 is available here: https://github.com/aws/aws-sdk-go-v2
http://aws.amazon.com/sdk-for-go/
Apache License 2.0
8.65k stars 2.07k forks source link

ECS WaitUntilServicesStable does not wait until service is stable #457

Closed george-makerbot closed 8 years ago

george-makerbot commented 8 years ago

Here is a complete test program for testing: http://play.golang.org/p/vQG2KkF6_a

The program successfully connects to ECS, registers a task definition, and updates a service. Where it is failing is actually waiting until the service has become stable. WaitUntilServicesStable is returning in under a second.

If you remove line 122 (exitLoop = true), then the polling logic works correctly, but the loop won't exit till the time limit has passed.

jasdel commented 8 years ago

Thanks for the suggestion and code sample @george-makerbot this will help a lot investigating the problem. Initially looking at this issue it looks like the problem might be related to the way the waiter determines the condition is satisfied.

Using the following expression:

services | [@[?length(deployments)!=`1`], @[?desiredCount!=runningCount]][] | length(@) == `0`

It looks that there might be a confusion in the way the SDK is detecting the condition is met.

jasdel commented 8 years ago

Hi @george-makerbot would you mind using this updated sample. It has debug logging turned on. Specifically I'm curious about what the wire HTTP body response of the ECS DescribeInstances you're seeing.

http://play.golang.org/p/JEKuG5lh-r Adds the following to line 96, and set the WaitUntilServicesStable to use the logging service

ecsSvcDebug := ecs.New(sess, &aws.Config{LogLevel: aws.LogLevel(aws.LogDebugWithHTTPBody)})

Specifically I'm curious if you're debug log's DescribeInstances response body contains a service[].events[].message, and what the message(s) are.

george-makerbot commented 8 years ago

Hi @jasdel

Here is the requested debugging information, I took the liberty of running the json output though a prettifier.

EDIT: Link instead of in lined output to make the comment history easier to read. https://gist.githubusercontent.com/george-makerbot/404dea1fc90de6f7346f/raw/bcca9f27e37f9cecd23d4f1cd15e056c770ad586/ecs_watch_debug.json

george-makerbot commented 8 years ago

I tracked it down to a problem in "github.com/jmespath/go-jmespath" but that library is a bit hard for me to traverse.

If we don't ignore the error on the Wait call here: https://github.com/aws/aws-sdk-go/blob/master/private/waiter/waiter.go#L64

We get Invalid type for: <nil>, expected: []jmespath.jpType{"string", "array", "object"

The code successfully pulls out and tests the simpler status checks. But fails on services | [@[?length(deployments)!=1], @[?desiredCount!=runningCount]][] | length(@) ==0``

I am a bit stumped here, using the output we have the the binary version of jmespath (https://github.com/jmespath/go-jmespath/blob/master/cmd/jpgo) I do not hit the error.

jasdel commented 8 years ago

Great find @george-makerbot, I think the binary jpgo vs jmespath.Search differences are attributed to map[string]interface{} vs concrete struct. I'm working on adding a test case to github.com/jmespath/go-jmespath to reproduce this issue. Once we have that we can create a issue with the upstream repo.

dpetersen commented 8 years ago

I've run into this issue as well, so I'll be happy to test any proposed fixes in this library or go-jmespath. Thanks for looking into it!

jasdel commented 8 years ago

Linked this issue to the upstream jmespath/go-jmespath#14 We can work with jmespath to resolve this issue.

ejholmes commented 8 years ago

I just ran into this as well. It appears that even a more trivial Argument for the waiter fails:

services | [@[?runningCount!=`0`]][] | length(@) == `0`

Given the input:

{
  "failures": [],
  "services": [
    {
      "clusterArn": "arn:aws:ecs:us-west-2:012345678910:cluster/telemetry",
      "deploymentConfiguration": {
          "maximumPercent": 200,
          "minimumHealthyPercent": 100
      },
      "deployments": [
        {
          "createdAt": 1432829320.611,
          "desiredCount": 0,
          "id": "ecs-svc/9223370604025455196",
          "pendingCount": 0,
          "runningCount": 0,
          "status": "PRIMARY",
          "taskDefinition": "arn:aws:ecs:us-west-2:012345678910:task-definition/hpcc-t2-medium:1",
          "updatedAt": 1432829320.611
        }
      ],
      "desiredCount": 0,
      "events": [],
      "loadBalancers": [],
      "pendingCount": 0,
      "runningCount": 0,
      "serviceArn": "arn:aws:ecs:us-west-2:012345678910:service/bunker-buster",
      "serviceName": "bunker-buster",
      "status": "ACTIVE",
      "taskDefinition": "arn:aws:ecs:us-west-2:012345678910:task-definition/hpcc-t2-medium:1"
    }
  ]
}

It returns false with go-jmespath, but true at http://jmespath.org. It appears that the value of runningCount in this case is actually the location in memory, since it's an *int64, not int64 so it fails when compared against 0.

It looks like fieldFromStruct doesn't attempt to obtain the actual value from a pointer to a basic type.

ejholmes commented 8 years ago

I opened a PR at https://github.com/jmespath/go-jmespath/pull/15 that adds a failing test for this as well as a potential implementation. Although, I can't help but feel like passing in req.Data is going to be problematic since it's basically relying on the struct fields to match the JSON exactly, which isn't always the case. Maybe it would be better to just pass in the raw response body instead?

jasdel commented 8 years ago

I'm working to verify a fix for the go-jmespath for this expression. It looks like the way typed slices were being handled was preventing the comparison from working.

jasdel commented 8 years ago

Thanks for all the information and assistance in addressing this issue. We identified the problem and got a fix pushed upstream to go-jmespath. If you're using the SDK with Go 1.5 vendoring experiment all you need to do is sync the latest version of the SDK. If you're not using vendoring you can update the SDK and jmespath with:

go get -u github.com/aws/aws-sdk-go/...
go get -u github.com/jmespath/go-jmespath