deis / controller

Deis Workflow Controller (API)
https://deis.com
MIT License
41 stars 53 forks source link

Failure to communicate with K8s API removes app from router #1200

Closed gedimin45 closed 7 years ago

gedimin45 commented 7 years ago

The controller failed to connect to the K8s API during a deploy (a ECONNRESET when checking whether the namespace exists as far as I can tell). Note that the app was already running, the problem occurred during a deploy of a new version of an app. As a result, the app got removed from the router somehow and trying to access it via its URL resulted in a 404. When I checked the nginx.conf of the router, there was indeed no entry for the app. Stack trace:

ERROR There was a problem retrieving data from the Kubernetes API server. URL: https://10.75.240.1:443/api/v1/namespaces/segsift, params: {} ERROR:root:failed to create Namespace segsift: 409 Conflict namespaces "segsift" already exists Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/requests/packages/urllib3/connectionpool.py", line 594, in urlopen chunked=chunked) File "/usr/local/lib/python3.5/dist-packages/requests/packages/urllib3/connectionpool.py", line 391, in _make_request six.raise_from(e, None) File "", line 2, in raise_from File "/usr/local/lib/python3.5/dist-packages/requests/packages/urllib3/connectionpool.py", line 387, in _make_request httplib_response = conn.getresponse() File "/usr/lib/python3.5/http/client.py", line 1197, in getresponse response.begin() File "/usr/lib/python3.5/http/client.py", line 297, in begin version, status, reason = self._read_status() File "/usr/lib/python3.5/http/client.py", line 258, in _read_status line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1") File "/usr/lib/python3.5/socket.py", line 575, in readinto return self._sock.recv_into(b) File "/usr/local/lib/python3.5/dist-packages/requests/packages/urllib3/contrib/pyopenssl.py", line 261, in recv_into raise SocketError(str(e)) OSError: (104, 'ECONNRESET') During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/requests/adapters.py", line 423, in send timeout=timeout File "/usr/local/lib/python3.5/dist-packages/requests/packages/urllib3/connectionpool.py", line 643, in urlopen _stacktrace=sys.exc_info()[2]) File "/usr/local/lib/python3.5/dist-packages/requests/packages/urllib3/util/retry.py", line 334, in increment raise six.reraise(type(error), error, _stacktrace) File "/usr/local/lib/python3.5/dist-packages/requests/packages/urllib3/packages/six.py", line 685, in reraise raise value.with_traceback(tb) File "/usr/local/lib/python3.5/dist-packages/requests/packages/urllib3/connectionpool.py", line 594, in urlopen chunked=chunked) File "/usr/local/lib/python3.5/dist-packages/requests/packages/urllib3/connectionpool.py", line 391, in _make_request six.raise_from(e, None) File "", line 2, in raise_from File "/usr/local/lib/python3.5/dist-packages/requests/packages/urllib3/connectionpool.py", line 387, in _make_request httplib_response = conn.getresponse() File "/usr/lib/python3.5/http/client.py", line 1197, in getresponse response.begin() File "/usr/lib/python3.5/http/client.py", line 297, in begin version, status, reason = self._read_status() File "/usr/lib/python3.5/http/client.py", line 258, in _read_status line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1") File "/usr/lib/python3.5/socket.py", line 575, in readinto return self._sock.recv_into(b) File "/usr/local/lib/python3.5/dist-packages/requests/packages/urllib3/contrib/pyopenssl.py", line 261, in recv_into raise SocketError(str(e)) requests.packages.urllib3.exceptions.ProtocolError: ('Connection aborted.', OSError("(104, 'ECONNRESET')",)) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/app/scheduler/__init__.py", line 171, in http_get response = self.session.get(url, params=params, **kwargs) File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 501, in get return self.request('GET', url, **kwargs) File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 488, in request resp = self.send(prep, **send_kwargs) File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 609, in send r = adapter.send(request, **kwargs) File "/usr/local/lib/python3.5/dist-packages/requests/adapters.py", line 473, in send raise ConnectionError(err, request=request) requests.exceptions.ConnectionError: ('Connection aborted.', OSError("(104, 'ECONNRESET')",)) The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/app/api/models/app.py", line 194, in create self._scheduler.ns.get(namespace) File "/app/scheduler/resources/namespace.py", line 22, in get response = self.http_get(url, params=self.query_params(**kwargs)) File "/app/scheduler/__init__.py", line 177, in http_get raise KubeException(message) from err scheduler.exceptions.KubeException: There was a problem retrieving data from the Kubernetes API server. URL: https://10.75.240.1:443/api/v1/namespaces/segsift, params: {} During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/app/api/models/app.py", line 196, in create self._scheduler.ns.create(namespace) File "/app/scheduler/resources/namespace.py", line 44, in create raise KubeHTTPException(response, "create Namespace {}".format(namespace)) scheduler.exceptions.KubeHTTPException: failed to create Namespace segsift: 409 Conflict namespaces "segsift" already exists The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/rest_framework/views.py", line 474, in dispatch response = handler(request, *args, **kwargs) File "/app/api/views.py", line 527, in create super(BuildHookViewSet, self).create(request, *args, **kwargs) File "/usr/local/lib/python3.5/dist-packages/rest_framework/mixins.py", line 21, in create self.perform_create(serializer) File "/app/api/viewsets.py", line 20, in perform_create obj = serializer.save(owner=self.request.user) File "/usr/local/lib/python3.5/dist-packages/rest_framework/serializers.py", line 214, in save self.instance = self.create(validated_data) File "/usr/local/lib/python3.5/dist-packages/rest_framework/serializers.py", line 902, in create instance = ModelClass.objects.create(**validated_data) File "/usr/local/lib/python3.5/dist-packages/django/db/models/manager.py", line 85, in manager_method return getattr(self.get_queryset(), name)(*args, **kwargs) File "/usr/local/lib/python3.5/dist-packages/django/db/models/query.py", line 399, in create obj.save(force_insert=True, using=self.db) File "/app/api/models/build.py", line 119, in save self.app.scale(self.owner, removed) File "/app/api/models/app.py", line 356, in scale self.create() File "/app/api/models/app.py", line 219, in create raise ServiceUnavailable('Kubernetes resources could not be created') from e api.exceptions.ServiceUnavailable: Kubernetes resources could not be created 10.72.5.11 "POST /v2/hooks/build/ HTTP/1.1" 503 54 "deis-builder"
bacongobbler commented 7 years ago

The only way the app would be removed from the router would be if the service for said app was no longer available. I have a feeling this is more of a hiccup between the router and the apiserver. Can you check the router logs and see if there's confirmation of that?

bacongobbler commented 7 years ago

I'm going to close this ticket due to inactivity, but please re-open if this still needs to be addressed. Thanks!