SKA-ScienceDataProcessor / integration-prototype

SDP Integration Prototype
BSD 3-Clause "New" or "Revised" License
4 stars 7 forks source link

master controller dies horribly when the sip stack is removed even if the state is "off" #92

Open david-terrett opened 5 years ago

david-terrett commented 5 years ago

Describe the bug When the sip stack is removed, the master controller throws an exception. This happens even if the current state is 'off'

To Reproduce

  1. Follow the 01_states_demo instructions
  2. Set the target_state = "OFF" with iTango
  3. remove the sip stack with "docker stack rm sip'
  4. Observe a long, ugly stack dump in the master controller log

Expected behavior A graceful exit. Since there is no way out of the off state the master controller should stop monitoring the database.

Screenshots &/or terminal output

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 184, in _read_from_socket
    raise socket.error(SERVER_CLOSED_CONNECTION_ERROR)
OSError: Connection closed by server.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/redis/client.py", line 2952, in _execute
    return command(*args)
  File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 636, in read_response
    raise e
  File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 633, in read_response
    response = self._parser.read_response()
  File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 291, in read_response
    response = self._buffer.readline()
  File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 223, in readline
    self._read_from_socket()
  File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 198, in _read_from_socket
    (e.args,))
redis.exceptions.ConnectionError: Error while reading from socket: ('Connection closed by server.',)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 493, in connect
    sock = self._connect()
  File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 550, in _connect
    raise err
  File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 538, in _connect
    sock.connect(socket_address)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/sip/app/__main__.py", line 331, in <module>
    main()
  File "/home/sip/app/__main__.py", line 322, in main
    _process_state_change_events()
  File "/home/sip/app/__main__.py", line 293, in _process_state_change_events
    _state_event = state_events.get()
  File "/usr/local/lib/python3.6/site-packages/sip_config_db/_events/event_queue.py", line 58, in get
    message = self._queue.get_message()
  File "/usr/local/lib/python3.6/site-packages/redis/client.py", line 3057, in get_message
    response = self.parse_response(block=False, timeout=timeout)
  File "/usr/local/lib/python3.6/site-packages/redis/client.py", line 2974, in parse_response
    return self._execute(connection, connection.read_response)
  File "/usr/local/lib/python3.6/site-packages/redis/client.py", line 2959, in _execute
    connection.connect()
  File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 498, in connect
    raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 111 connecting to ec_config_database:6379. Connection refused.
bmort commented 5 years ago

mmm yeah that's a good one :) Looks like it comes from the MC not protecting against the Config DB going down. If so, we should also be able to trigger the same error by simply removing the Redis service in any SDP state.

apmcd42 commented 5 years ago

The problem as I see it is that the database was reporting a different set of allowed target states to those that it would accept for a specific SDP state. Eventually you end up with the exception.

I did not handle the exception because, in my opinion, these exceptions should be handled by the database layer, and where required, raise a package-specific exception to be handled by layers further up the chain, the master controller among them. So I would have been handling the wrong exception and be required to rewrite the handler later on.