F5Networks / k8s-bigip-ctlr

Repository for F5 Container Ingress Services for Kubernetes & OpenShift.
Apache License 2.0
359 stars 195 forks source link

Controller stops updating BIG-IP configuration after an exception occurs in the python driver #506

Closed russokj closed 6 years ago

russokj commented 6 years ago

Description

If the bigipdriver.py process encounters an exception in the _do_reset thread, then the thread dies and no longer processes updates to the config file. The front-end controller is unaware of this and keeps updating the config file. Many of these exceptions are transient in nature and the python driver should handle them gracefully.

Kubernetes Version

1.4.0

Controller Version

master branch as of 1/23/18

BIG-IP Version

12.1.2

Diagnostic Information

Typical log file has an error similar to the following:

21:55:44 2018/01/23 04:53:31 [INFO] Traceback (most recent call last):
21:55:44 2018/01/23 04:53:31 [INFO]   File "/opt/rh/python27/root/usr/lib64/python2.7/threading.py", line 804, in __bootstrap_inner
21:55:44 2018/01/23 04:53:31 [INFO]     self.run()
21:55:44 2018/01/23 04:53:31 [INFO]   File "/opt/rh/python27/root/usr/lib64/python2.7/threading.py", line 757, in run
21:55:44 2018/01/23 04:53:31 [INFO]     self.__target(*self.__args, **self.__kwargs)
21:55:44 2018/01/23 04:53:31 [INFO]   File "/app/python/bigipconfigdriver.py", line 320, in _do_reset
21:55:44 2018/01/23 04:53:31 [INFO]     config = _parse_config(self._config_file)
21:55:44 2018/01/23 04:53:31 [INFO]   File "/app/python/bigipconfigdriver.py", line 584, in _parse_config
21:55:44 2018/01/23 04:53:31 [INFO]     config_json = json.load(config)
21:55:44 2018/01/23 04:53:31 [INFO]   File "/opt/rh/python27/root/usr/lib64/python2.7/json/__init__.py", line 291, in load
21:55:44 2018/01/23 04:53:31 [INFO]     **kw)
21:55:44 2018/01/23 04:53:31 [INFO]   File "/opt/rh/python27/root/usr/lib64/python2.7/json/__init__.py", line 339, in loads
21:55:44 2018/01/23 04:53:31 [INFO]     return _default_decoder.decode(s)
21:55:44 2018/01/23 04:53:31 [INFO]   File "/opt/rh/python27/root/usr/lib64/python2.7/json/decoder.py", line 364, in decode
21:55:44 2018/01/23 04:53:31 [INFO]     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
21:55:44 2018/01/23 04:53:31 [INFO]   File "/opt/rh/python27/root/usr/lib64/python2.7/json/decoder.py", line 382, in raw_decode
21:55:44 2018/01/23 04:53:31 [INFO]     raise ValueError("No JSON object could be decoded")
21:55:44 2018/01/23 04:53:31 [INFO] ValueError: No JSON object could be decoded
21:55:44 2018/01/23 04:53:31 [INFO] [2018-01-23 04:53:31,065 __main__ INFO] entering inotify loop to watch /tmp/k8s-bigip-ctlr.config553021570/config.json
21:55:44 2018/01/23 04:53:31 [INFO] 
f5yacobucci commented 6 years ago

This issue can also be caused by a ConnectionError to the BIG-IP:

2018/01/23 16:57:06 [INFO] Exception in thread Thread-1:
2018/01/23 16:57:06 [INFO] Traceback (most recent call last):
2018/01/23 16:57:06 [INFO]   File "/usr/local/lib/python2.7/threading.py", line 801, in __bootstrap_inner
2018/01/23 16:57:06 [INFO]     self.run()
2018/01/23 16:57:06 [INFO]   File "/usr/local/lib/python2.7/threading.py", line 754, in run
2018/01/23 16:57:06 [INFO]     self.__target(*self.__args, **self.__kwargs)
2018/01/23 16:57:06 [INFO]   File "/app/python/bigipconfigdriver.py", line 330, in _do_reset
2018/01/23 16:57:06 [INFO]     incomplete += mgr._apply_ltm_config(cfg_ltm)
2018/01/23 16:57:06 [INFO]   File "/app/python/bigipconfigdriver.py", line 105, in _apply_ltm_config
2018/01/23 16:57:06 [INFO]     return self._cccl.apply_config(config)
2018/01/23 16:57:06 [INFO]   File "/app/src/f5-cccl/f5_cccl/api.py", line 74, in apply_config
2018/01/23 16:57:06 [INFO]     return self._service_manager.apply_config(services)
2018/01/23 16:57:06 [INFO]   File "/app/src/f5-cccl/f5_cccl/service/manager.py", line 376, in apply_config
2018/01/23 16:57:06 [INFO]     retval = self._service_deployer.deploy(desired_config)
2018/01/23 16:57:06 [INFO]   File "/app/src/f5-cccl/f5_cccl/service/manager.py", line 221, in deploy
2018/01/23 16:57:06 [INFO]     self._bigip.refresh()
2018/01/23 16:57:06 [INFO]   File "/app/src/f5-cccl/f5_cccl/bigip.py", line 144, in refresh
2018/01/23 16:57:06 [INFO]     self._refresh()
2018/01/23 16:57:06 [INFO]   File "/app/src/f5-cccl/f5_cccl/bigip.py", line 216, in _refresh
2018/01/23 16:57:06 [INFO]     requests_params={"params": query})
2018/01/23 16:57:06 [INFO]   File "/usr/local/lib/python2.7/site-packages/f5/bigip/resource.py", line 781, in get_collection
2018/01/23 16:57:06 [INFO]     self.refresh(**kwargs)
2018/01/23 16:57:06 [INFO]   File "/usr/local/lib/python2.7/site-packages/f5/bigip/resource.py", line 651, in refresh
2018/01/23 16:57:06 [INFO]     self._refresh(**kwargs)
2018/01/23 16:57:06 [INFO]   File "/usr/local/lib/python2.7/site-packages/f5/bigip/resource.py", line 634, in _refresh
2018/01/23 16:57:06 [INFO]     response = refresh_session.get(uri, **requests_params)
2018/01/23 16:57:06 [INFO]   File "/usr/local/lib/python2.7/site-packages/icontrol/session.py", line 258, in wrapper
2018/01/23 16:57:06 [INFO]     response = method(self, REST_uri, **kwargs)
2018/01/23 16:57:06 [INFO]   File "/usr/local/lib/python2.7/site-packages/icontrol/session.py", line 449, in get
2018/01/23 16:57:06 [INFO]     return self.session.get(uri, **kwargs)
2018/01/23 16:57:06 [INFO]   File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 480, in get
2018/01/23 16:57:06 [INFO]     return self.request('GET', url, **kwargs)
2018/01/23 16:57:06 [INFO]   File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 468, in request
2018/01/23 16:57:06 [INFO]     resp = self.send(prep, **send_kwargs)
2018/01/23 16:57:06 [INFO]   File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 576, in send
2018/01/23 16:57:06 [INFO]     r = adapter.send(request, **kwargs)
2018/01/23 16:57:06 [INFO]   File "/usr/local/lib/python2.7/site-packages/requests/adapters.py", line 426, in send
2018/01/23 16:57:06 [INFO]     raise ConnectionError(err, request=request)
2018/01/23 16:57:06 [INFO] ConnectionError: ('Connection aborted.', BadStatusLine("''",))
2018/01/23 16:57:06 [INFO]
amudukutore commented 6 years ago

This fix for this issue probably addresses https://github.com/F5Networks/k8s-bigip-ctlr/issues/418

russokj commented 6 years ago

Here is an example of the fix when encountering a connection error with the BIG-IP:

2018/01/24 19:04:13 [INFO] ConnectionError: HTTPSConnectionPool(host='172.16.2.29', port=443): Max retries exceeded with url: /mgmt/shared/authn/login (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f1811172910>: Failed to establish a new connection: [Errno 113] Host is unreachable',))
2018/01/24 19:04:36 [ERROR] [2018-01-24 19:04:36,898 __main__ ERROR] Unexpected error
2018/01/24 19:04:36 [INFO] Traceback (most recent call last):
2018/01/24 19:04:36 [INFO]   File "/app/python/bigipconfigdriver.py", line 324, in _do_reset
2018/01/24 19:04:36 [INFO]     incomplete = self._update_cccl(config)
2018/01/24 19:04:36 [INFO]   File "/app/python/bigipconfigdriver.py", line 403, in _update_cccl
2018/01/24 19:04:36 [INFO]     incomplete += mgr._apply_ltm_config(cfg_ltm)
2018/01/24 19:04:36 [INFO]   File "/app/python/bigipconfigdriver.py", line 117, in _apply_ltm_config
2018/01/24 19:04:36 [INFO]     return self._cccl.apply_ltm_config(config)
2018/01/24 19:04:36 [INFO]   File "/app/src/f5-cccl/f5_cccl/api.py", line 83, in apply_ltm_config
2018/01/24 19:04:36 [INFO]     return self._service_manager.apply_ltm_config(services)
2018/01/24 19:04:36 [INFO]   File "/app/src/f5-cccl/f5_cccl/service/manager.py", line 521, in apply_ltm_config
2018/01/24 19:04:36 [INFO]     default_route_domain = self._bigip.get_default_route_domain()
2018/01/24 19:04:36 [INFO]   File "/app/src/f5-cccl/f5_cccl/bigip.py", line 433, in get_default_route_domain
2018/01/24 19:04:36 [INFO]     name=self._partition)
2018/01/24 19:04:36 [INFO]   File "/usr/local/lib/python2.7/site-packages/f5/bigip/resource.py", line 1054, in load
2018/01/24 19:04:36 [INFO]     return self._load(**kwargs)
2018/01/24 19:04:36 [INFO]   File "/usr/local/lib/python2.7/site-packages/f5/bigip/resource.py", line 1033, in _load
2018/01/24 19:04:36 [INFO]     response = refresh_session.get(base_uri, **kwargs)
2018/01/24 19:04:36 [INFO]   File "/usr/local/lib/python2.7/site-packages/icontrol/session.py", line 257, in wrapper
2018/01/24 19:04:36 [INFO]     response = method(self, REST_uri, **kwargs)
2018/01/24 19:04:36 [INFO]   File "/usr/local/lib/python2.7/site-packages/icontrol/session.py", line 462, in get
2018/01/24 19:04:36 [INFO]     return self.session.get(uri, **kwargs)
2018/01/24 19:04:36 [INFO]   File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 480, in get
2018/01/24 19:04:36 [INFO]     return self.request('GET', url, **kwargs)
2018/01/24 19:04:36 [INFO]   File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 454, in request
2018/01/24 19:04:36 [INFO]     prep = self.prepare_request(req)
2018/01/24 19:04:36 [INFO]   File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 388, in prepare_request
2018/01/24 19:04:36 [INFO]     hooks=merge_hooks(request.hooks, self.hooks),
2018/01/24 19:04:36 [INFO]   File "/usr/local/lib/python2.7/site-packages/requests/models.py", line 297, in prepare
2018/01/24 19:04:36 [INFO]     self.prepare_auth(auth, url)
2018/01/24 19:04:36 [INFO]   File "/usr/local/lib/python2.7/site-packages/requests/models.py", line 490, in prepare_auth
2018/01/24 19:04:36 [INFO]     r = auth(self)
2018/01/24 19:04:36 [INFO]   File "/usr/local/lib/python2.7/site-packages/icontrol/authtoken.py", line 254, in __call__
2018/01/24 19:04:36 [INFO]     self.get_new_token(netloc)
2018/01/24 19:04:36 [INFO]   File "/usr/local/lib/python2.7/site-packages/icontrol/authtoken.py", line 150, in get_new_token
2018/01/24 19:04:36 [INFO]     auth=HTTPBasicAuth(self.username, self.password)
2018/01/24 19:04:36 [INFO]   File "/usr/local/lib/python2.7/site-packages/requests/api.py", line 107, in post
2018/01/24 19:04:36 [INFO]     return request('post', url, data=data, json=json, **kwargs)
2018/01/24 19:04:36 [INFO]   File "/usr/local/lib/python2.7/site-packages/requests/api.py", line 53, in request
2018/01/24 19:04:36 [INFO]     return session.request(method=method, url=url, **kwargs)
2018/01/24 19:04:36 [INFO]   File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 468, in request
2018/01/24 19:04:36 [INFO]     resp = self.send(prep, **send_kwargs)
2018/01/24 19:04:36 [INFO]   File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 576, in send
2018/01/24 19:04:36 [INFO]     r = adapter.send(request, **kwargs)
2018/01/24 19:04:36 [INFO]   File "/usr/local/lib/python2.7/site-packages/requests/adapters.py", line 437, in send
2018/01/24 19:04:36 [INFO]     raise ConnectionError(e, request=request)
2018/01/24 19:04:36 [INFO] ConnectionError: HTTPSConnectionPool(host='172.16.2.29', port=443): Max retries exceeded with url: /mgmt/shared/authn/login (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f18111924d0>: Failed to establish a new connection: [Errno 113] Host is unreachable',))
2018/01/24 19:04:36 [ERROR] [2018-01-24 19:04:36,899 __main__ ERROR] Error applying config, will try again in 64 seconds
...