[DPE-3899, DPE-4173] Try to clear connection pool before relating with COS to avoid TIME_WAIT connections; stabilize exporter tests

shayancanonical commented 3 months ago

Issue

https://github.com/canonical/mysql-router-k8s-operator/issues/219 https://github.com/canonical/mysql-router-k8s-operator/issues/229

We are running into a connection in TIME_WAIT when starting mysql_router_exporter. This happens in both the K8s and VM charms. The guess is that this is related to the connection pool we use in our tests.

Solution

Clear the connection pool before relating with COS, re-run exporter tests 5-10 times, observe if tests have stabilized

shayancanonical commented 3 months ago

8/10 successes ain't bad see runs

i am in favor of merging these incremental changes and observing if the nightly test success rate improves. i believe that there may be another issue at play -- the bad descriptor related one in run 9

ops.pebble.ChangeError: cannot perform the following tasks:
- Start service "mysql_router_exporter" (cannot start service: exited quickly with code 1)
----- Logs from task 0 -----
2024-05-22T00:38:28Z INFO Service "mysql_router_exporter" has never been started.
----- Logs from task 1 -----
2024-05-22T00:38:28Z INFO Most recent service output:
    2024/05/22 00:38:28 Start exporter on 0.0.0.0:49152/metrics
    2024/05/22 00:38:28 Server started
    2024/05/22 00:38:28 listen: listen tcp 0.0.0.0:49152: bind: address already in use
2024-05-22T00:38:28Z ERROR cannot start service: exited quickly with code 1
-----
unit-mysql-router-k8s-0: 00:38:28 ERROR juju.worker.uniter.operation hook "metrics-endpoint-relation-created" (via hook dispatching script: dispatch) failed: exit status 1

 INFO     pytest_operator.plugin:plugin.py:862 Forgetting main...
ERROR    websockets.client:protocol.py:1015 data transfer failed
Traceback (most recent call last):
  File "/home/runner/work/mysql-router-k8s-operator/mysql-router-k8s-operator/.tox/integration/lib/python3.10/site-packages/websockets/legacy/protocol.py", line 959, in transfer_data
    message = await self.read_message()
  File "/home/runner/work/mysql-router-k8s-operator/mysql-router-k8s-operator/.tox/integration/lib/python3.10/site-packages/websockets/legacy/protocol.py", line 1029, in read_message
    frame = await self.read_data_frame(max_size=self.max_size)
  File "/home/runner/work/mysql-router-k8s-operator/mysql-router-k8s-operator/.tox/integration/lib/python3.10/site-packages/websockets/legacy/protocol.py", line 1104, in read_data_frame
    frame = await self.read_frame(max_size)
  File "/home/runner/work/mysql-router-k8s-operator/mysql-router-k8s-operator/.tox/integration/lib/python3.10/site-packages/websockets/legacy/protocol.py", line 1161, in read_frame
    frame = await Frame.read(
  File "/home/runner/work/mysql-router-k8s-operator/mysql-router-k8s-operator/.tox/integration/lib/python3.10/site-packages/websockets/legacy/framing.py", line 68, in read
    data = await reader(2)
  File "/usr/lib/python3.10/asyncio/streams.py", line 708, in readexactly
    await self._wait_for_data('readexactly')
  File "/usr/lib/python3.10/asyncio/streams.py", line 501, in _wait_for_data
    await self._waiter
  File "/usr/lib/python3.10/asyncio/selector_events.py", line 924, in write
    n = self._sock.send(data)
OSError: [Errno 9] Bad file descriptor

shayancanonical commented 3 months ago

If I am to guess, there may be another thing at play (what it is, I am not sure). I think this change makes sense nonetheless -- we get more stable tests, fix a potential cause in the usage of connection pools, and does not change the charm code. In my opinion, we should continue to invest in eliminating the error occurrences even if we merge this PR

shayancanonical commented 3 months ago

Whether the behavior is visible outside tests: very rarely but less frequently since mysql-router-exporter implemented graceful shutdowns in response to this issue

carlcsaposs-canonical commented 3 months ago

question: would using requests instead of urllib3 mean that we wouldn't have to manage the connections manually?

canonical / mysql-router-k8s-operator

[DPE-3899, DPE-4173] Try to clear connection pool before relating with COS to avoid TIME_WAIT connections; stabilize exporter tests #245

Issue

Solution