We're seeing contrail-controller units in error state, it appears they are choking because the leader data contains invalid values.
juju status contrail-controller # Excerpted
...
App Version Status Scale Charm Store Rev OS Notes
...
contrail-controller 1912-32 error 3 contrail-controller jujucharms 12 ubuntu
...
Unit Workload Agent Machine Public address Ports Message
contrail-controller/0 error idle 6 x.x.85.87 hook failed: "leader-settings-changed"
...
contrail-controller/1 error idle 7 x.x.85.65 hook failed: "leader-settings-changed"
...
contrail-controller/2* error idle 8 x.x.84.237 hook failed: "leader-settings-changed"
...
The below log excerpt shows the traceback (some prod data replaced with "x"). Note: 'control_servers': [None, None, 'x.x.85.87'] in the CTX values below. The Jinja renderer tries to sort them, but sorting None isn't supported in Python3 (it is in Python2). This list should probably contain all 3 controller node ipaddr instead.
2020-02-03 09:58:33 INFO juju-log CTX: {'module': 'controller', 'log_level': 'SYS_INFO', 'bgp_asn': '64512', 'flow_export_rate': '0', 'auth_mode': 'rbac', 'cloud_admin_role': 'admin', 'global_read_only_role': None, 'configdb_minimum_diskgb': '4', 'jvm_extra_opts': '-Xms8g -Xmx8g', 'container_registry': 'hub.juniper.net/contrail', 'contrail_version_tag': '1912.32', 'cloud_orchestrator': 'openstack', 'metadata_shared_secret': 'xxx', 'compute_service_ip': 'x.x.84.56', 'image_service_ip': 'x.x.84.50', 'network_service_ip': 'x.x.84.55', 'ssl_enabled': False, 'config_analytics_ssl_available': True, 'logging': 'logging:\n driver: json-file\n options:\n max-file: "5"\n max-size: "20m"\n', 'controller_servers': ['x.x.88.6', 'x.x.88.134', 'x.x.88.70'], 'control_servers': [None, None, 'x.x.85.87'], 'analytics_servers': ['x.x.88.18', 'x.x.88.85', 'x.x.88.147']}
2020-02-03 09:58:33 INFO juju-log Render and store new configuration: /etc/contrail/common_config.env
2020-02-03 09:58:33 DEBUG leader-settings-changed Traceback (most recent call last):
2020-02-03 09:58:33 DEBUG leader-settings-changed File "/var/lib/juju/agents/unit-contrail-controller-2/charm/hooks/leader-settings-changed", line 597, in <module>
2020-02-03 09:58:33 DEBUG leader-settings-changed main()
2020-02-03 09:58:33 DEBUG leader-settings-changed File "/var/lib/juju/agents/unit-contrail-controller-2/charm/hooks/leader-settings-changed", line 591, in main
2020-02-03 09:58:33 DEBUG leader-settings-changed hooks.execute(sys.argv)
2020-02-03 09:58:33 DEBUG leader-settings-changed File "/var/lib/juju/agents/unit-contrail-controller-2/charm/hooks/charmhelpers/core/hookenv.py", line 914, in execute
2020-02-03 09:58:33 DEBUG leader-settings-changed self._hooks[hook_name]()
2020-02-03 09:58:33 DEBUG leader-settings-changed File "/var/lib/juju/agents/unit-contrail-controller-2/charm/hooks/leader-settings-changed", line 87, in leader_settings_changed
2020-02-03 09:58:33 DEBUG leader-settings-changed utils.update_charm_status()
2020-02-03 09:58:33 DEBUG leader-settings-changed File "/var/lib/juju/agents/unit-contrail-controller-2/charm/hooks/contrail_controller_utils.py", line 165, in update_charm_status
2020-02-03 09:58:33 DEBUG leader-settings-changed BASE_CONFIGS_PATH + "/common_config.env", ctx)
2020-02-03 09:58:33 DEBUG leader-settings-changed File "/var/lib/juju/agents/unit-contrail-controller-2/charm/hooks/common_utils.py", line 280, in render_and_log
2020-02-03 09:58:33 DEBUG leader-settings-changed render(template, conf_file, ctx, perms=perms)
2020-02-03 09:58:33 DEBUG leader-settings-changed File "/var/lib/juju/agents/unit-contrail-controller-2/charm/hooks/charmhelpers/core/templating.py", line 85, in render
2020-02-03 09:58:33 DEBUG leader-settings-changed content = template.render(context)
2020-02-03 09:58:33 DEBUG leader-settings-changed File "/usr/lib/python3/dist-packages/jinja2/asyncsupport.py", line 76, in render
2020-02-03 09:58:33 DEBUG leader-settings-changed return original_render(self, *args, **kwargs)
2020-02-03 09:58:33 DEBUG leader-settings-changed File "/usr/lib/python3/dist-packages/jinja2/environment.py", line 1008, in render
2020-02-03 09:58:33 DEBUG leader-settings-changed return self.environment.handle_exception(exc_info, True)
2020-02-03 09:58:33 DEBUG leader-settings-changed File "/usr/lib/python3/dist-packages/jinja2/environment.py", line 780, in handle_exception
2020-02-03 09:58:33 DEBUG leader-settings-changed reraise(exc_type, exc_value, tb)
2020-02-03 09:58:33 DEBUG leader-settings-changed File "/usr/lib/python3/dist-packages/jinja2/_compat.py", line 37, in reraise
2020-02-03 09:58:33 DEBUG leader-settings-changed raise value.with_traceback(tb)
2020-02-03 09:58:33 DEBUG leader-settings-changed File "/var/lib/juju/agents/unit-contrail-controller-2/charm/templates/config.env", line 34, in top-level template code
2020-02-03 09:58:33 DEBUG leader-settings-changed CONTROL_NODES={{ control_servers|sort|join(',') }}
2020-02-03 09:58:33 DEBUG leader-settings-changed File "/usr/lib/python3/dist-packages/jinja2/filters.py", line 278, in do_sort
2020-02-03 09:58:33 DEBUG leader-settings-changed return sorted(value, key=key_func, reverse=reverse)
2020-02-03 09:58:33 DEBUG leader-settings-changed TypeError: '<' not supported between instances of 'NoneType' and 'NoneType'
2020-02-03 09:58:33 ERROR juju.worker.uniter.operation runhook.go:132 hook "leader-settings-changed" failed: exit status 1
To confirm, asking for controller_data_ip_list which is the source of control_servers also has 2 None values in there:
juju run -u contrail-controller/2 'leader-get controller_data_ip_list'
[null, null, "x.x.85.87"]
We're seeing contrail-controller units in error state, it appears they are choking because the leader data contains invalid values.
The below log excerpt shows the traceback (some prod data replaced with "x"). Note:
'control_servers': [None, None, 'x.x.85.87']
in the CTX values below. The Jinja renderer tries to sort them, but sorting None isn't supported in Python3 (it is in Python2). This list should probably contain all 3 controller node ipaddr instead.To confirm, asking for
controller_data_ip_list
which is the source ofcontrol_servers
also has 2 None values in there:These units were upgraded from 5.0.2