deis / controller

Deis Workflow Controller (API)
https://deis.com
MIT License
41 stars 53 forks source link

Failed to delete pod during rollback #1002

Closed gabrtv closed 8 years ago

gabrtv commented 8 years ago

While working through a simple buildpack workflow, I ran into an error on rollback:

$ deis releases
=== go Releases
v3  2016-08-22T14:47:47Z    gabrtv added POWERED_BY
v2  2016-08-22T14:46:30Z    gabrtv deployed 3ab4abf
v1  2016-08-22T14:46:09Z    gabrtv created initial release
$ deis rollback v2
Rolling back to v2... Error: Unknown Error (400): {"detail":"('failed to delete Pod \"go-web-1441118122-jnrsq\" in Namespace \"go\": 404 Not Found', 'go-web-1441118122-jnrsq', 'go')"}

Despite the error, the rollback seems to have worked server-side. Environment notes:

helgi commented 8 years ago

Cleanup logic going sideways it looks like - Did v3 have any issues?

gabrtv commented 8 years ago

@helgi no issues with v3. We've been seeing this pretty reliably on the 2.4.1 demos. We're also seeing the same error on config:set operations.

lachie83 commented 8 years ago

Here's the stack

ERROR:root:('failed to delete Pod "go-web-3068475914-nwshd" in Namespace "go": 404 Not Found', 'go-web-3068475914-nwshd', 'go')
Traceback (most recent call last):
  File "/app/api/views.py", line 273, in post_save
    config.app.deploy(self.release)
  File "/app/api/models/app.py", line 575, in deploy
    release.cleanup_old()
  File "/app/api/models/release.py", line 308, in cleanup_old
    self._scheduler.delete_pod(self.app.id, pod['metadata']['name'])
  File "/app/scheduler/__init__.py", line 1267, in delete_pod
    raise KubeHTTPException(resp, 'delete Pod "{}" in Namespace "{}"', name, namespace)
scheduler.KubeHTTPException: ('failed to delete Pod "go-web-3068475914-nwshd" in Namespace "go": 404 Not Found', 'go-web-3068475914-nwshd', 'go')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/views.py", line 471, in dispatch
    response = handler(request, *args, **kwargs)
  File "/app/api/views.py", line 181, in create
    return super(AppResourceViewSet, self).create(request, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/mixins.py", line 21, in create
    self.perform_create(serializer)
  File "/app/api/viewsets.py", line 21, in perform_create
    self.post_save(obj)
  File "/app/api/views.py", line 276, in post_save
    raise DeisException(str(e)) from e
api.exceptions.DeisException: ('failed to delete Pod "go-web-3068475914-nwshd" in Namespace "go": 404 Not Found', 'go-web-3068475914-nwshd', 'go')
10.244.2.9 "POST /v2/apps/go/config/ HTTP/1.1" 400 133 "Deis Client vv2.4.0"
helgi commented 8 years ago

I find it interesting you see it so consistently but we do not see this in e2e. What Kubernetes version and what provider?

lachie83 commented 8 years ago

GCE and GKE v1.3.5

helgi commented 8 years ago

Try the associated PR

lachie83 commented 8 years ago

Testing now. Stay tuned.

lachie83 commented 8 years ago

Looks good. No issues since patching. Thanks @helgi!

felixbuenemann commented 8 years ago

@helgi Why is the manual cleanup needed at all? If I switch a kubernetes deployment to a new image, it will scale up the new replica set, wait for a bit to see if it came up fine and then terminate the pods in the old replica set automatically.

helgi commented 8 years ago

Because people are still migrating from RC, we need some way to clean up RC and the pods it may leave behind

felixbuenemann commented 8 years ago

Ah, I see, thanks for the clarification.

helgi commented 8 years ago

When I have a better auto-migration story from RC to Deployments (2.4 was intended to be that but it was in fact the cleanup functions that were supposed to help, but failed) then I will most likely yank that code out