deis / controller

Deis Workflow Controller (API)
https://deis.com
MIT License
41 stars 53 forks source link

Conflict updating service during deploy with multiple process types #994

Closed felixbuenemann closed 8 years ago

felixbuenemann commented 8 years ago

The deployment of a rails buildpack app with the default process types (web, rake, worker, console) sporadically fails with a conflict when it tries to update the service after the build and kubernetes deploy has finished.

The app has it's web process type scaled to 2, everything else is at 0 and is using kubernetes deployments.

I am running controller v2.4.0 on workflow v2.4.0 with k8s v1.3.4 in a multi-az setup with one node in each zone.

Controller Log:

10.2.52.11 "GET /v2/hooks/key/c4:50:c8:06:f7:76:1b:a3:1f:3d:56:ce:75:72:db:65 HTTP/1.1" 200 137 "deis-builder"
10.2.52.11 "POST /v2/hooks/config/ HTTP/1.1" 200 3113 "deis-builder"
INFO [shop-backend]: build shop-backend-0e9d0d8 created
INFO [shop-backend]: gitlab-runner deployed 0da5e09
INFO [shop-backend]: buildpack type detected. Defaulting to $PORT 5000
INFO [shop-backend]: buildpack type detected. Defaulting to $PORT 5000
INFO [shop-backend]: buildpack type detected. Defaulting to $PORT 5000
INFO [shop-backend]: buildpack type detected. Defaulting to $PORT 5000
INFO [shop-backend]: adding 30s on to the original 120s timeout to account for the initial delay specified in the liveness / readiness probe
INFO [shop-backend]: This deployments overall timeout is 150s - batch timout is 150s and there are 1 batches to deploy with a total of 2 pods
INFO [shop-backend]: waited 10s and 2 pods are in service
INFO [shop-backend]: waited 20s and 2 pods are in service
INFO [shop-backend]: waited 30s and 2 pods are in service
ERROR [shop-backend]: (app::deploy): ('failed to update Service "shop-backend" in Namespace "shop-backend": 409 Conflict', 'shop-backend', 'shop-backend')
ERROR:root:(app::deploy): ('failed to update Service "shop-backend" in Namespace "shop-backend": 409 Conflict', 'shop-backend', 'shop-backend')
Traceback (most recent call last):
  File "/app/scheduler/__init__.py", line 206, in _update_application_service
    self.update_service(namespace, namespace, data=service)
  File "/app/scheduler/__init__.py", line 1219, in update_service
    'update Service "{}" in Namespace "{}"', namespace, name
scheduler.KubeHTTPException: ('failed to update Service "shop-backend" in Namespace "shop-backend": 409 Conflict', 'shop-backend', 'shop-backend')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/app/api/models/app.py", line 562, in deploy
    async_run(tasks)
  File "/app/api/utils.py", line 169, in async_run
    raise error
  File "/usr/lib/python3.5/asyncio/tasks.py", line 241, in _step
    result = coro.throw(exc)
  File "/app/api/utils.py", line 182, in async_task
    yield from loop.run_in_executor(None, params)
  File "/usr/lib/python3.5/asyncio/futures.py", line 361, in __iter__
    yield self  # This tells Task to wait for completion.
  File "/usr/lib/python3.5/asyncio/tasks.py", line 296, in _wakeup
    future.result()
  File "/usr/lib/python3.5/asyncio/futures.py", line 274, in result
    raise self._exception
  File "/usr/lib/python3.5/concurrent/futures/thread.py", line 55, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/app/scheduler/__init__.py", line 138, in deploy
    self._update_application_service(namespace, name, app_type, port, routable, service_annotations)  # noqa
  File "/app/scheduler/__init__.py", line 209, in _update_application_service
    self.update_service(namespace, namespace, data=old_service)
  File "/app/scheduler/__init__.py", line 1219, in update_service
    'update Service "{}" in Namespace "{}"', namespace, name
scheduler.KubeHTTPException: ('failed to update Service "shop-backend" in Namespace "shop-backend": 409 Conflict', 'shop-backend', 'shop-backend')
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/app/api/models/build.py", line 64, in create
    self.app.deploy(new_release)
  File "/app/api/models/app.py", line 566, in deploy
    raise ServiceUnavailable(err) from e
api.exceptions.ServiceUnavailable: (app::deploy): ('failed to update Service "shop-backend" in Namespace "shop-backend": 409 Conflict', 'shop-backend', 'shop-backend')
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/views.py", line 471, in dispatch
    response = handler(request, *args, **kwargs)
  File "/app/api/views.py", line 486, in create
    super(BuildHookViewSet, self).create(request, *args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/mixins.py", line 21, in create
    self.perform_create(serializer)
  File "/app/api/viewsets.py", line 21, in perform_create
    self.post_save(obj)
  File "/app/api/views.py", line 492, in post_save
    build.create(self.user)
  File "/app/api/models/build.py", line 71, in create
    raise DeisException(str(e)) from e
api.exceptions.DeisException: (app::deploy): ('failed to update Service "shop-backend" in Namespace "shop-backend": 409 Conflict', 'shop-backend', 'shop-backend')
10.2.52.11 "POST /v2/hooks/build/ HTTP/1.1" 400 149 "deis-builder"
felixbuenemann commented 8 years ago

@helgi Have you had some time to look into this problem?

helgi commented 8 years ago

@felixbuenemann I didn't have time to look at it over the weekend unfortunately but @kmala was going to look at tackling this while he fixed another bug in pretty much the same spot

felixbuenemann commented 8 years ago

Great, thanks for the update.

felixbuenemann commented 8 years ago

@kmala Thanks for the fix.

Will there be a v2.4.2 release for the router with this fix or do I need to build my own image?