apecloud / kubeblocks

KubeBlocks is an open-source control plane software that runs and manages databases, message queues and other stateful applications on K8s.
https://kubeblocks.io
GNU Affero General Public License v3.0
2.08k stars 170 forks source link

[BUG] two follower processes in the mysql cluster created by playground continously reports error logs after it's h-scaled to three replicas #1581

Closed weicao closed 1 year ago

weicao commented 1 year ago

Describe the bug

  1. create a mysql cluster using playground
  2. h-scale to 3 replicas
  3. although raft replication works, (e.g. create a table in leader node, the table canbe shown on two followers), logs of two followers print error messages like "[2023-02-22 12:57:19.417310] [ERROR] Server 3 : the server id in the msg(101) is not match with local server id for a follower, this may happen during the configure change or hit a bug!!" continously.

version

wei@MacBook-Pro:~/ApeCloud/kubeblocks$ ./bin/kbcli cluster list-components NAME NAMESPACE CLUSTER TYPE IMAGE mysql default mycluster mysql docker.io/apecloud/apecloud-mysql-server:8.0.30-5.alpha2.20230105.gd6b8719.1

example:

wei@MacBook-Pro:~/ApeCloud/kubeblocks$ ./bin/kbcli cluster log mycluster -i mycluster-mysql-2 --limit-bytes 12000 Defaulted container "mysql" out of: mysql, inject-mysql-exporter, kb-rolechangedcheck, config-manager-sidecar mycluster-mysql-0

mycluster-mysql-2

mycluster-mysql-2.mycluster-mysql-headless mysql -hmycluster-mysql-0.mycluster-mysql-headless -uroot -pphbgcCow -e "call dbms_consensus.add_learner('mycluster-mysql-2.mycluster-mysql-headless:13306');" >> /tmp/setup_error.log 2>&1 exit code: 0 2 mycluster-mysql-0.mycluster-mysql-headless:13306;mycluster-mysql-1.mycluster-mysql-headless:13306;mycluster-mysql-2.mycluster-mysql-headless:13306@3 mycluster-mysql-0 docker-entrypoint.sh mysqld --defaults-file=/opt/mysql/my.cnf --cluster-start-index=1 --cluster-info="mycluster-mysql-0.mycluster-mysql-headless:13306;mycluster-mysql-1.mycluster-mysql-headless:13306;mycluster-mysql-2.mycluster-mysql-headless:13306@3" --cluster-id=1 2023-02-22 12:57:02+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server started. 2023-02-22 12:57:02+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql' 2023-02-22 12:57:02+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server started. 2023-02-22 12:57:02+00:00 [Note] [Entrypoint]: Initializing database files mysqld --defaults-file=/opt/mysql/my.cnf --cluster-start-index=1 --cluster-info=mycluster-mysql-0.mycluster-mysql-headless:13306;mycluster-mysql-1.mycluster-mysql-headless:13306;mycluster-mysql-2.mycluster-mysql-headless:13306@3 --cluster-id=1 --initialize-insecure --default-time-zone=SYSTEM --consensus-enabled=0 --disable-log-bin 2023-02-22T12:57:02.786853Z 0 [Warning] [MY-011068] [Server] The syntax 'slave_exec_mode' is deprecated and will be removed in a future release. Please use replica_exec_mode instead. 2023-02-22T12:57:02.787077Z 0 [System] [MY-013169] [Server] /usr/sbin/mysqld (mysqld 8.0.30) initializing of server in progress as process 145 2023-02-22T12:57:02.791017Z 0 [Warning] [MY-010161] [Server] You need to use --log-bin to make --log-replica-updates work. 2023-02-22T12:57:02.791021Z 0 [Warning] [MY-010161] [Server] You need to use --log-bin to make --binlog-format work. 2023-02-22T12:57:02.813933Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started. 2023-02-22T12:57:03.173465Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended. 2023-02-22T12:57:04.032801Z 5 [Warning] [MY-010453] [Server] root@localhost is created with an empty password ! Please consider switching off the --initialize-insecure option. 2023-02-22 12:57:05+00:00 [Note] [Entrypoint]: Database files initialized 2023-02-22 12:57:05+00:00 [Note] [Entrypoint]: Starting temporary server with --consensus-enabled=0 --disable-log-bin 2023-02-22 12:57:05+00:00 [Note] [Entrypoint]: Waiting for server startup 2023-02-22T12:57:06.384093Z 0 [Warning] [MY-011068] [Server] The syntax 'slave_exec_mode' is deprecated and will be removed in a future release. Please use replica_exec_mode instead. 2023-02-22T12:57:06.392413Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.30) starting as process 193 2023-02-22T12:57:06.395046Z 0 [Warning] [MY-010161] [Server] You need to use --log-bin to make --log-replica-updates work. 2023-02-22T12:57:06.395052Z 0 [Warning] [MY-010161] [Server] You need to use --log-bin to make --binlog-format work. 2023-02-22T12:57:06.401280Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started. 2023-02-22T12:57:06.578507Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended. 2023-02-22T12:57:06.803302Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed. 2023-02-22T12:57:06.803333Z 0 [System] [MY-013602] [Server] Channel mysql_main configured to support TLS. Encrypted connections are now supported for this channel. 2023-02-22T12:57:06.818259Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.0.30' socket: '/var/run/mysqld/mysqld.sock' port: 0 WeSQL Server - GPL, Release 5, Revision d6b8719. 2023-02-22 12:57:06+00:00 [Note] [Entrypoint]: Temporary server started. '/var/lib/mysql/mysql.sock' -> '/var/run/mysqld/mysqld.sock' Warning: Unable to load '/usr/share/zoneinfo/iso3166.tab' as time zone. Skipping it. Warning: Unable to load '/usr/share/zoneinfo/leapseconds' as time zone. Skipping it. Warning: Unable to load '/usr/share/zoneinfo/tzdata.zi' as time zone. Skipping it. Warning: Unable to load '/usr/share/zoneinfo/zone.tab' as time zone. Skipping it. Warning: Unable to load '/usr/share/zoneinfo/zone1970.tab' as time zone. Skipping it. 2023-02-22 12:57:08+00:00 [Note] [Entrypoint]: Creating database mydb 2023-02-22 12:57:08+00:00 [Note] [Entrypoint]: Creating user u1 2023-02-22 12:57:08+00:00 [Note] [Entrypoint]: Giving user u1 access to schema mydb

2023-02-22 12:57:08+00:00 [Note] [Entrypoint]: Stopping temporary server 2023-02-22T12:57:08.155391Z 14 [System] [MY-013172] [Server] Received SHUTDOWN from user root. Shutting down mysqld (Version: 8.0.30). 2023-02-22T12:57:09.205422Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.30) WeSQL Server - GPL, Release 5, Revision d6b8719. 2023-02-22 12:57:10+00:00 [Note] [Entrypoint]: Temporary server stopped 2023-02-22 12:57:10+00:00 [Note] [Entrypoint]: Initializing consensus meta: mysqld --defaults-file=/opt/mysql/my.cnf --cluster-start-index=1 --cluster-info=mycluster-mysql-0.mycluster-mysql-headless:13306;mycluster-mysql-1.mycluster-mysql-headless:13306;mycluster-mysql-2.mycluster-mysql-headless:13306@3 --cluster-id=1 --cluster-id=1 --cluster-start-index=1 --cluster-force-reset-meta=ON --upgrade=FORCE 2023-02-22T12:57:10.375965Z 0 [Warning] [MY-011068] [Server] The syntax 'slave_exec_mode' is deprecated and will be removed in a future release. Please use replica_exec_mode instead. 2023-02-22T12:57:10.377426Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.30) starting as process 276 2023-02-22T12:57:10.384604Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started. 2023-02-22T12:57:10.488362Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended. 2023-02-22T12:57:10.580105Z 4 [System] [MY-013381] [Server] Server upgrade from '80030' to '80030' started. 2023-02-22T12:57:14.479701Z 4 [System] [MY-013381] [Server] Server upgrade from '80030' to '80030' completed. 2023-02-22T12:57:14.588968Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed. 2023-02-22T12:57:14.588995Z 0 [System] [MY-013602] [Server] Channel mysql_main configured to support TLS. Encrypted connections are now supported for this channel. 2023-02-22T12:57:14.600899Z 0 [Warning] [MY-000000] [Server] Force change meta to system table successfully. 2023-02-22T12:57:17.193589Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.30) WeSQL Server - GPL, Release 5, Revision d6b8719. 2023-02-22 12:57:17+00:00 [Note] [Entrypoint]: Consensus meta initialized

2023-02-22 12:57:17+00:00 [Note] [Entrypoint]: MySQL init process done. Ready for start up.

2023-02-22T12:57:17.505886Z 0 [Warning] [MY-011068] [Server] The syntax 'slave_exec_mode' is deprecated and will be removed in a future release. Please use replica_exec_mode instead. 2023-02-22T12:57:17.507281Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.30) starting as process 7 2023-02-22T12:57:17.513208Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started. 2023-02-22T12:57:17.607681Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended. 2023-02-22T12:57:17.704615Z 0 [System] [MY-010229] [Server] Starting XA crash recovery... 2023-02-22T12:57:17.711803Z 0 [System] [MY-010232] [Server] XA crash recovery finished. 2023-02-22T12:57:17.714030Z 0 [Warning] [MY-000000] [Server] Recover consensus index is 0 2023-02-22T12:57:17.775092Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed. 2023-02-22T12:57:17.775119Z 0 [System] [MY-013602] [Server] Channel mysqlmain configured to support TLS. Encrypted connections are now supported for this channel. [2023-02-22 12:57:17.794302] [Default] EasyNet::onConnected server 1 [2023-02-22 12:57:17.794449] [Default] EasyNet::onConnected server 2 [2023-02-22 12:57:17.802315] [ERROR] Server 3 : the server id in the msg(101) is not match with local server id for a follower, this may happen during the configure change or hit a bug!! [2023-02-22 12:57:17.802315] [Default] Server 3 : New Term in onAppendLog !! server 1 's term(3) is bigger than me(1). [2023-02-22 12:57:17.802315] [Default] Server 3 : new term(old:1,new:3) !! [2023-02-22 12:57:17.802315] [Default] Server 3 : Paxos state change from FOLL to FOLL !! [2023-02-22 12:57:17.803874] [ERROR] Server 3 : the server id in the msg(101) is not match with local server id for a follower, this may happen during the configure change or hit a bug!! [2023-02-22 12:57:17.803874] [Default] Server 3 : applyConfigureChange done! logIndex(21) currentTerm(3) val.cctype(2) val.optype(3) [2023-02-22 12:57:17.803874] [Default] Server 3 : applyConfigureChange_ done! logIndex(22) currentTerm(3) val.cctype(2) val.optype(3) 2023-02-22T12:57:17.808964Z 0 [ERROR] [MY-000000] [Server] gtids is cleared of relay_log working 2023-02-22T12:57:17.812259Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.0.30' socket: '/var/run/mysqld/mysqld.sock' port: 3306 WeSQL Server - GPL, Release 5, Revision d6b8719. 2023-02-22T12:57:17.814832Z 5 [System] [MY-025002] [Server] Consensus apply thread start, recover status = 0, consensus start apply index = 0, rli consensus index = 0. 2023-02-22T12:57:17.814883Z 5 [System] [MY-025003] [Server] Consensus apply thread group relay log file name = './mysql-bin.000001', pos = 251, rli apply index = 0. [2023-02-22 12:57:18.004486] [ERROR] Server 3 : the server id in the msg(101) is not match with local server id for a follower, this may happen during the configure change or hit a bug!! [2023-02-22 12:57:18.205173] [ERROR] Server 3 : the server id in the msg(101) is not match with local server id for a follower, this may happen during the configure change or hit a bug!! [2023-02-22 12:57:18.405741] [ERROR] Server 3 : the server id in the msg(101) is not match with local server id for a follower, this may happen during the configure change or hit a bug!! [2023-02-22 12:57:18.606786] [ERROR] Server 3 : the server id in the msg(101) is not match with local server id for a follower, this may happen during the configure change or hit a bug!! [2023-02-22 12:57:18.808057] [ERROR] Server 3 : the server id in the msg(101) is not match with local server id for a follower, this may happen during the configure change or hit a bug!! [2023-02-22 12:57:19.009146] [ERROR] Server 3 : the server id in the msg(101) is not match with local server id for a follower, this may happen during the configure change or hit a bug!! [2023-02-22 12:57:19.217056] [ERROR] Server 3 : the server id in the msg(101) is not match with local server id for a follower, this may happen during the configure change or hit a bug!! [2023-02-22 12:57:19.417310] [ERROR] Server 3 : the server id in the msg(101) is not match with local server id for a follower, this may happen during the configure change or hit a bug!! [2023-02-22 12:57:19.618623] [ERROR] Server 3 : the server id in the msg(101) is not match with local server id for a follower, this may happen during the configure change or hit a bug!! [2023-02-22 12:57:19.819614] [ERROR] Server 3 : the server id in the msg(101) is not match with local server id for a follower, this may happen during the configure change or hit a bug!! [2023-02-22 12:57:20.020947] [ERROR] Server 3 : the server id in the msg(101) is not match with local server id for a follower, this may happen during the configure change or hit a bug!! [2023-02-22 12:57:20.222863] [ERROR] Server 3 : the server id in the msg(101) is not match with local server id for a follower, this may happen during the configure change or hit a bug!! [2023-02-22 12:57:20.423319] [ERROR] Server 3 : the server id in the msg(101) is not match with local server id for a follower, this may happen during the configure change or hit a bug!!

weicao commented 1 year ago

error log on the other follower(server 2):

[2023-02-22 12:57:17.908929] [ERROR] Server 2 : the server id in the msg(100) is not match with local server id for a follower, this may happen during the configure change or hit a bug!!

logs on leader:

[2023-02-22 11:30:44.979470] [Default] Server 1 : Enter startElectionCallback [2023-02-22 11:30:44.979470] [Default] Server 1 : Paxos state change from FOLL to CAND !! [2023-02-22 11:30:44.979470] [Default] Server 1 : Epoch task currentEpoch(0) during requestVote [2023-02-22 11:30:44.979470] [Default] Server 1 : Start new requestVote: new term(3) [2023-02-22 11:30:44.979470] [Default] Server 1 : Paxos state change from CAND to LEDR !! [2023-02-22 11:30:44.979470] [Default] Server 1 : become Leader (currentTerm 3, lli:20, llt:3)!! 2023-02-22T11:30:45.006145Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.0.30' socket: '/var/run/mysqld/mysqld.sock' port: 3306 WeSQL Server - GPL, Release 5, Revision d6b8719. 2023-02-22T11:30:45.013893Z 5 [System] [MY-025002] [Server] Consensus apply thread start, recover status = 1, consensus start apply index = 19, rli consensus index = 0. 2023-02-22T11:30:45.014232Z 5 [System] [MY-025003] [Server] Consensus apply thread group relay log file name = './mysql-bin.000001', pos = 5818, rli apply index = 19. 2023-02-22T11:30:45.017594Z 5 [System] [MY-025004] [Server] Consensus apply thread stop, opt_consensus_leader_stop_apply: false, seconds_behind_master: 1677065445, opt_consensus_leader_stop_applytime: 0. 2023-02-22T11:30:45.017649Z 5 [System] [MY-025005] [Server] Consensus apply thread catchup commit index, consensus index: 19, current term: 2, stop term: 3. 2023-02-22T12:57:02.079070Z 227 [Warning] [MY-010055] [Server] IP address '10.42.0.20' could not be resolved: Name or service not known 2023-02-22T12:57:02.079256Z 228 [Warning] [MY-010055] [Server] IP address '10.42.0.21' could not be resolved: Name or service not known [2023-02-22 12:57:02.096314] [Default] Server 1 : configureChange begin: cctype(2) optype(3) term(3) lli(20) addrs(mycluster-mysql-1.mycluster-mysql-headless:13306 ) [2023-02-22 12:57:02.189806] [Default] Server 1 : applyConfigureChange done! logIndex(21) currentTerm(3) val.cctype(2) val.optype(3) [2023-02-22 12:57:02.189829] [Default] Server 1 : configureChange return: cctype(2) optype(3) addrs(mycluster-mysql-1.mycluster-mysql-headless:13306 ) return(0) success(0) preparedIndex(0) lli(21) [2023-02-22 12:57:02.189867] [Default] Server 1 : configureChange begin: cctype(2) optype(3) term(3) lli(21) addrs(mycluster-mysql-2.mycluster-mysql-headless:13306 ) [2023-02-22 12:57:02.276977] [Default] Server 1 : applyConfigureChange_ done! logIndex(22) currentTerm(3) val.cctype(2) val.optype(3) [2023-02-22 12:57:02.276993] [Default] Server 1 : configureChange return: cctype(2) optype(3) addrs(mycluster-mysql-2.mycluster-mysql-headless:13306 ) return(0) success(0) preparedIndex(0) lli(22) [2023-02-22 12:57:17.505800] [Default] EasyNet::onConnected server 100 [2023-02-22 12:57:17.505800] [Default] Server 1 : update server 100 's nextIndex(old:1,new:21) when onConnect [2023-02-22 12:57:17.802076] [Default] EasyNet::onConnected server 101 [2023-02-22 12:57:17.802076] [Default] Server 1 : update server 101 's nextIndex(old:1,new:21) when onConnect [2023-02-22 14:06:31.428596] [Default] EasyNet::onDisconnected server 101 [2023-02-22 14:06:31.479413] [Default] EasyNet::onConnected server 101 [2023-02-22 14:06:31.479413] [Default] Server 1 : update server 101 's nextIndex(old:23,new:23) when onConnect

lynnleelhl commented 1 year ago

h-scale should not allowed in playground mysql, because k3d doesn't support volume snapshot