Juniper / contrail-charms

Juju charms for Contrail services.
Apache License 2.0
13 stars 22 forks source link

Contrail-controller invalid leader data #144

Closed sabaini closed 4 years ago

sabaini commented 4 years ago

We're seeing contrail-controller units in error state, it appears they are choking because the leader data contains invalid values.

juju status contrail-controller  # Excerpted
...
App                  Version        Status  Scale  Charm                Store       Rev  OS      Notes
...
contrail-controller  1912-32        error       3  contrail-controller  jujucharms   12  ubuntu  
...

Unit                      Workload  Agent  Machine  Public address   Ports          Message
contrail-controller/0     error     idle   6        x.x.85.87                  hook failed: "leader-settings-changed"
...
contrail-controller/1     error     idle   7        x.x.85.65                  hook failed: "leader-settings-changed"
...
contrail-controller/2*    error     idle   8        x.x.84.237                 hook failed: "leader-settings-changed"
...

The below log excerpt shows the traceback (some prod data replaced with "x"). Note: 'control_servers': [None, None, 'x.x.85.87'] in the CTX values below. The Jinja renderer tries to sort them, but sorting None isn't supported in Python3 (it is in Python2). This list should probably contain all 3 controller node ipaddr instead.

2020-02-03 09:58:33 INFO juju-log CTX: {'module': 'controller', 'log_level': 'SYS_INFO', 'bgp_asn': '64512', 'flow_export_rate': '0', 'auth_mode': 'rbac', 'cloud_admin_role': 'admin', 'global_read_only_role': None, 'configdb_minimum_diskgb': '4', 'jvm_extra_opts': '-Xms8g -Xmx8g', 'container_registry': 'hub.juniper.net/contrail', 'contrail_version_tag': '1912.32', 'cloud_orchestrator': 'openstack', 'metadata_shared_secret': 'xxx', 'compute_service_ip': 'x.x.84.56', 'image_service_ip': 'x.x.84.50', 'network_service_ip': 'x.x.84.55', 'ssl_enabled': False, 'config_analytics_ssl_available': True, 'logging': 'logging:\n  driver: json-file\n  options:\n    max-file: "5"\n    max-size: "20m"\n', 'controller_servers': ['x.x.88.6', 'x.x.88.134', 'x.x.88.70'], 'control_servers': [None, None, 'x.x.85.87'], 'analytics_servers': ['x.x.88.18', 'x.x.88.85', 'x.x.88.147']}
2020-02-03 09:58:33 INFO juju-log Render and store new configuration: /etc/contrail/common_config.env
2020-02-03 09:58:33 DEBUG leader-settings-changed Traceback (most recent call last):
2020-02-03 09:58:33 DEBUG leader-settings-changed   File "/var/lib/juju/agents/unit-contrail-controller-2/charm/hooks/leader-settings-changed", line 597, in <module>
2020-02-03 09:58:33 DEBUG leader-settings-changed     main()
2020-02-03 09:58:33 DEBUG leader-settings-changed   File "/var/lib/juju/agents/unit-contrail-controller-2/charm/hooks/leader-settings-changed", line 591, in main
2020-02-03 09:58:33 DEBUG leader-settings-changed     hooks.execute(sys.argv)
2020-02-03 09:58:33 DEBUG leader-settings-changed   File "/var/lib/juju/agents/unit-contrail-controller-2/charm/hooks/charmhelpers/core/hookenv.py", line 914, in execute
2020-02-03 09:58:33 DEBUG leader-settings-changed     self._hooks[hook_name]()
2020-02-03 09:58:33 DEBUG leader-settings-changed   File "/var/lib/juju/agents/unit-contrail-controller-2/charm/hooks/leader-settings-changed", line 87, in leader_settings_changed
2020-02-03 09:58:33 DEBUG leader-settings-changed     utils.update_charm_status()
2020-02-03 09:58:33 DEBUG leader-settings-changed   File "/var/lib/juju/agents/unit-contrail-controller-2/charm/hooks/contrail_controller_utils.py", line 165, in update_charm_status
2020-02-03 09:58:33 DEBUG leader-settings-changed     BASE_CONFIGS_PATH + "/common_config.env", ctx)
2020-02-03 09:58:33 DEBUG leader-settings-changed   File "/var/lib/juju/agents/unit-contrail-controller-2/charm/hooks/common_utils.py", line 280, in render_and_log
2020-02-03 09:58:33 DEBUG leader-settings-changed     render(template, conf_file, ctx, perms=perms)
2020-02-03 09:58:33 DEBUG leader-settings-changed   File "/var/lib/juju/agents/unit-contrail-controller-2/charm/hooks/charmhelpers/core/templating.py", line 85, in render
2020-02-03 09:58:33 DEBUG leader-settings-changed     content = template.render(context)
2020-02-03 09:58:33 DEBUG leader-settings-changed   File "/usr/lib/python3/dist-packages/jinja2/asyncsupport.py", line 76, in render
2020-02-03 09:58:33 DEBUG leader-settings-changed     return original_render(self, *args, **kwargs)
2020-02-03 09:58:33 DEBUG leader-settings-changed   File "/usr/lib/python3/dist-packages/jinja2/environment.py", line 1008, in render
2020-02-03 09:58:33 DEBUG leader-settings-changed     return self.environment.handle_exception(exc_info, True)
2020-02-03 09:58:33 DEBUG leader-settings-changed   File "/usr/lib/python3/dist-packages/jinja2/environment.py", line 780, in handle_exception
2020-02-03 09:58:33 DEBUG leader-settings-changed     reraise(exc_type, exc_value, tb)
2020-02-03 09:58:33 DEBUG leader-settings-changed   File "/usr/lib/python3/dist-packages/jinja2/_compat.py", line 37, in reraise
2020-02-03 09:58:33 DEBUG leader-settings-changed     raise value.with_traceback(tb)
2020-02-03 09:58:33 DEBUG leader-settings-changed   File "/var/lib/juju/agents/unit-contrail-controller-2/charm/templates/config.env", line 34, in top-level template code
2020-02-03 09:58:33 DEBUG leader-settings-changed     CONTROL_NODES={{ control_servers|sort|join(',') }}
2020-02-03 09:58:33 DEBUG leader-settings-changed   File "/usr/lib/python3/dist-packages/jinja2/filters.py", line 278, in do_sort
2020-02-03 09:58:33 DEBUG leader-settings-changed     return sorted(value, key=key_func, reverse=reverse)
2020-02-03 09:58:33 DEBUG leader-settings-changed TypeError: '<' not supported between instances of 'NoneType' and 'NoneType'
2020-02-03 09:58:33 ERROR juju.worker.uniter.operation runhook.go:132 hook "leader-settings-changed" failed: exit status 1

To confirm, asking for controller_data_ip_list which is the source of control_servers also has 2 None values in there:

juju run -u contrail-controller/2 'leader-get controller_data_ip_list'
[null, null, "x.x.85.87"]

These units were upgraded from 5.0.2

Andrey-mp commented 4 years ago

Upgrade procedure was not developed and was never described and was never tested.