docker-library / mysql

Docker Official Image packaging for MySQL Community Server
https://dev.mysql.com/
GNU General Public License v2.0
2.48k stars 2.21k forks source link

mysql:5.7.43 connection reset #1077

Open SebastienNH opened 4 months ago

SebastienNH commented 4 months ago

We have simplified the environment we are running down to a single container running on the docker server, with mysql client accessing the mysql container. All recommended configuration changes, as advised by Atlassian have been applied, as initial environment we were running were running JIRA, Confluence and MYSQL containers. mysql configured to 8 hour timeout for connections.

We are seeing connections being dropped in the mysql client, No connection. Trying to reconnect... There is no pattern to the duration when this occurs, 5 mins to 50 mins, we arbitrarily see this problem

[root@ost-clb-atl-dmc-c01 ~]# mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.43 MySQL Community Server (GPL)

Copyright (c) 2000, 2024, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> select now();
ERROR 2013 (HY000): Lost connection to MySQL server during query
No connection. Trying to reconnect...
Connection id:    3
Current database: *** NONE ***`

Docker Network

[root@ost-clb-atl-dmc-c01 mysql]# docker network ls
NETWORK ID     NAME              DRIVER    SCOPE
3352cad49ba7   bridge            bridge    local
bf9ba81a8e28   docker_gwbridge   bridge    local
cc040dee87d0   host              host      local
v8zswgiyduo0   ingress           overlay   swarm
r04hth47im8z   mysql-private     overlay   swarm
1a1e3f13b5e6   none              null      local

Docker Containers

[root@ost-clb-atl-dmc-c01 mysql]# docker ps -a
CONTAINER ID   IMAGE                                               COMMAND                  CREATED         STATUS         PORTS                 NAMES
52c2c7d87f92   nexus.ostravam.corp.telstra.com:5000/mysql:5.7.43   "docker-entrypoint.s…"   4 seconds ago   Up 3 seconds   3306/tcp, 33060/tcp   mysql_mysql.1.5ltdc2y8k1ju4r7kv6l8mrc36

Network Logging gwbridge network receives a reset packet for the initial connection that was established (AEST timezone)

6:30:18.913810 IP 172.31.1.1.51658 > 172.31.1.2.3306: Flags [.], ack 3827, win 1409, options [nop,nop,TS val 1710428538 ecr 3304546861], length 0
16:30:24.370359 ARP, Request who-has 172.31.1.2 tell 172.31.1.1, length 28
16:30:24.370373 ARP, Request who-has 172.31.1.1 tell 172.31.1.2, length 28
16:30:24.370379 ARP, Reply 172.31.1.1 is-at 02:42:d3:a7:b4:1e, length 28
16:30:24.370390 ARP, Reply 172.31.1.2 is-at 02:42:ac:1f:01:02, length 28
16:55:21.152848 IP 172.31.1.1.51658 > 172.31.1.2.3306: Flags [P.], seq 1251:1297, ack 3827, win 1409, options [nop,nop,TS val 1711930777 ecr 3304546861], length 46
16:55:21.152904 IP 172.31.1.2.3306 > 10.145.247.114.51658: Flags [R], seq 724319931, win 0, length 0
16:55:21.154149 IP 172.31.1.1.38042 > 172.31.1.2.3306: Flags [S], seq 1145254527, win 43690, options [mss 65495,sackOK,TS val 1711930778 ecr 0,nop,wscale 7], length 0

ingress network also receives a reset packet for the initial connection that was established (UTC timezone)

06:30:18.913817 eth1  In  IP 172.31.1.1.51658 > 172.31.1.2.3306: Flags [.], ack 3827, win 1409, options [nop,nop,TS val 1710428538 ecr 3304546861], length 0
06:30:18.913824 eth0  Out IP 10.0.0.2.51658 > 10.0.0.4.3306: Flags [.], ack 3827, win 1409, options [nop,nop,TS val 1710428538 ecr 3304546861], length 0
06:30:24.370350 eth0  Out ARP, Request who-has 10.0.0.4 tell 10.0.0.2, length 28
06:30:24.370353 eth1  Out ARP, Request who-has 172.31.1.1 tell 172.31.1.2, length 28
06:30:24.370377 eth1  In  ARP, Request who-has 172.31.1.2 tell 172.31.1.1, length 28
06:30:24.370382 eth1  Out ARP, Reply 172.31.1.2 is-at 02:42:ac:1f:01:02, length 28
06:30:24.370387 eth1  In  ARP, Reply 172.31.1.1 is-at 02:42:d3:a7:b4:1e, length 28
06:30:24.370389 eth0  In  ARP, Request who-has 10.0.0.2 tell 10.0.0.4, length 28
06:30:24.370391 eth0  Out ARP, Reply 10.0.0.2 is-at 02:42:0a:00:00:02, length 28
06:30:24.370394 eth0  In  ARP, Reply 10.0.0.4 is-at 02:42:0a:00:00:04, length 28
06:55:21.152860 eth1  In  IP 172.31.1.1.51658 > 172.31.1.2.3306: Flags [P.], seq 1252:1298, ack 3827, win 1409, options [nop,nop,TS val 1711930777 ecr 3304546861], length 46
06:55:21.152887 eth1  Out IP 172.31.1.2.3306 > 172.31.1.1.51658: Flags [R], seq 724319931, win 0, length 0
06:55:21.154154 eth1  In  IP 172.31.1.1.38042 > 172.31.1.2.3306: Flags [S], seq 1145254527, win 43690, options [mss 65495,sackOK,TS val 1711930778 ecr 0,nop,wscale 7], length 0
06:55:21.154180 eth0  Out IP 10.0.0.2.38042 > 10.0.0.4.3306: Flags [S], seq 1145254527, win 43690, options [mss 65495,sackOK,TS val 1711930778 ecr 0,nop,wscale 7], length 0

mysql container network initial connection is not dropped, a new connection is established

06:30:18.913614 IP 10.0.0.2.51658 > 10.0.0.4.3306: Flags [P.], seq 1206:1252, ack 3727, win 1409, options [nop,nop,TS val 1710428538 ecr 3303806990], length 46
06:30:18.913760 IP 10.0.0.4.3306 > 10.0.0.2.51658: Flags [P.], seq 3727:3827, ack 1252, win 244, options [nop,nop,TS val 3304546861 ecr 1710428538], length 100
06:30:18.913827 IP 10.0.0.2.51658 > 10.0.0.4.3306: Flags [.], ack 3827, win 1409, options [nop,nop,TS val 1710428538 ecr 3304546861], length 0
06:30:24.370362 ARP, Request who-has 10.0.0.2 tell 10.0.0.4, length 28
06:30:24.370384 ARP, Request who-has 10.0.0.4 tell 10.0.0.2, length 28
06:30:24.370388 ARP, Reply 10.0.0.4 is-at 02:42:0a:00:00:04, length 28
06:30:24.370394 ARP, Reply 10.0.0.2 is-at 02:42:0a:00:00:02, length 28
06:55:21.154195 IP 10.0.0.2.38042 > 10.0.0.4.3306: Flags [S], seq 1145254527, win 43690, options [mss 65495,sackOK,TS val 1711930778 ecr 0,nop,wscale 7], length 0
06:55:21.154218 IP 10.0.0.4.3306 > 10.0.0.2.38042: Flags [S.], seq 2416441610, ack 1145254528, win 27960, options [mss 1410,sackOK,TS val 3306049101 ecr 1711930778,nop,wscale 7], length 0
06:55:21.154278 IP 10.0.0.2.38042 > 10.0.0.4.3306: Flags [.], ack 1, win 342, options [nop,nop,TS val 1711930778 ecr 3306049101], length 0

Any ideas on what the cause of these dropouts and how to remedy them is appreciated

tianon commented 4 months ago

This is a difficult one, and I'm honestly a little bit at a loss for where to suggest you might go next in your debugging. Unfortunately, there are a lot of factors that could contribute to something like this, and they're all likely to be environmental or configuration related, as you appear to have already deduced, so I don't think there's much we can do from the perspective of the image to fix/improve this. :disappointed:

(Happy to reconsider/try to help debug more if there's a reliable and minimal reproducer though -- I'm thinking something like a docker run with the default configuration that can reproduce somehow.)