Closed shayancanonical closed 3 months ago
8/10 successes ain't bad see runs
i am in favor of merging these incremental changes and observing if the nightly test success rate improves. i believe that there may be another issue at play -- the bad descriptor related one in run 9
ops.pebble.ChangeError: cannot perform the following tasks:
- Start service "mysql_router_exporter" (cannot start service: exited quickly with code 1)
----- Logs from task 0 -----
2024-05-22T00:38:28Z INFO Service "mysql_router_exporter" has never been started.
----- Logs from task 1 -----
2024-05-22T00:38:28Z INFO Most recent service output:
2024/05/22 00:38:28 Start exporter on 0.0.0.0:49152/metrics
2024/05/22 00:38:28 Server started
2024/05/22 00:38:28 listen: listen tcp 0.0.0.0:49152: bind: address already in use
2024-05-22T00:38:28Z ERROR cannot start service: exited quickly with code 1
-----
unit-mysql-router-k8s-0: 00:38:28 ERROR juju.worker.uniter.operation hook "metrics-endpoint-relation-created" (via hook dispatching script: dispatch) failed: exit status 1
INFO pytest_operator.plugin:plugin.py:862 Forgetting main...
ERROR websockets.client:protocol.py:1015 data transfer failed
Traceback (most recent call last):
File "/home/runner/work/mysql-router-k8s-operator/mysql-router-k8s-operator/.tox/integration/lib/python3.10/site-packages/websockets/legacy/protocol.py", line 959, in transfer_data
message = await self.read_message()
File "/home/runner/work/mysql-router-k8s-operator/mysql-router-k8s-operator/.tox/integration/lib/python3.10/site-packages/websockets/legacy/protocol.py", line 1029, in read_message
frame = await self.read_data_frame(max_size=self.max_size)
File "/home/runner/work/mysql-router-k8s-operator/mysql-router-k8s-operator/.tox/integration/lib/python3.10/site-packages/websockets/legacy/protocol.py", line 1104, in read_data_frame
frame = await self.read_frame(max_size)
File "/home/runner/work/mysql-router-k8s-operator/mysql-router-k8s-operator/.tox/integration/lib/python3.10/site-packages/websockets/legacy/protocol.py", line 1161, in read_frame
frame = await Frame.read(
File "/home/runner/work/mysql-router-k8s-operator/mysql-router-k8s-operator/.tox/integration/lib/python3.10/site-packages/websockets/legacy/framing.py", line 68, in read
data = await reader(2)
File "/usr/lib/python3.10/asyncio/streams.py", line 708, in readexactly
await self._wait_for_data('readexactly')
File "/usr/lib/python3.10/asyncio/streams.py", line 501, in _wait_for_data
await self._waiter
File "/usr/lib/python3.10/asyncio/selector_events.py", line 924, in write
n = self._sock.send(data)
OSError: [Errno 9] Bad file descriptor
If I am to guess, there may be another thing at play (what it is, I am not sure). I think this change makes sense nonetheless -- we get more stable tests, fix a potential cause in the usage of connection pools, and does not change the charm code. In my opinion, we should continue to invest in eliminating the error occurrences even if we merge this PR
Whether the behavior is visible outside tests: very rarely but less frequently since mysql-router-exporter implemented graceful shutdowns in response to this issue
question: would using requests instead of urllib3 mean that we wouldn't have to manage the connections manually?
Issue
https://github.com/canonical/mysql-router-k8s-operator/issues/219 https://github.com/canonical/mysql-router-k8s-operator/issues/229
We are running into a connection in
TIME_WAIT
when startingmysql_router_exporter
. This happens in both the K8s and VM charms. The guess is that this is related to the connection pool we use in our tests.Solution
Clear the connection pool before relating with COS, re-run exporter tests 5-10 times, observe if tests have stabilized