MariaDB / mariadb-docker

Docker Official Image packaging for MariaDB
https://mariadb.org
GNU General Public License v2.0
768 stars 438 forks source link

[Galera Cluster Docker] Issue with v10.4+ #501

Closed mrwormo closed 1 year ago

mrwormo commented 1 year ago

Hello to all,

I can't running a working Galera Cluster (2 nodes - 2 servers) with MariaDb.10.4+ but all is running fine with MariaDb.10.3.

The first node bootstrap the cluster without errors.

Here is the end of the docker log from the second node :

...
mariadb       | 2023-03-17 15:31:52 2 [Warning] WSREP: Failed to prepare for incremental state transfer: Failed to open IST listener at tcp://10.10.1.16:4568', asio error 'Failed to listen: bind: Cannot assign requested address: 99 (Cannot assign requested address)
mariadb       |      at /home/buildbot/buildbot/build/galerautils/src/gu_asio_stream_react.cpp:listen():788': 99 (Cannot assign requested address)
mariadb       |      at /home/buildbot/buildbot/build/galera/src/ist.cpp:prepare():331. IST will be unavailable.
mariadb       | 2023-03-17 15:31:52 0 [Note] WSREP: Member 1.0 (dockerdeviia) requested state transfer from '*any*'. Selected 0.0 (dockerdeviia2)(SYNCED) as donor.
mariadb       | 2023-03-17 15:31:52 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 2)
mariadb       | 2023-03-17 15:31:52 2 [Note] WSREP: Requesting state transfer: success, donor: 0
mariadb       | 2023-03-17 15:31:52 0 [Warning] WSREP: 0.0 (dockerdeviia2): State transfer to 1.0 (dockerdeviia) failed: -42 (No message of desired type)
mariadb       | 2023-03-17 15:31:52 0 [ERROR] WSREP: /home/buildbot/buildbot/build/gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():1207: Will never receive state. Need to abort.
mariadb       | 2023-03-17 15:31:52 0 [Note] WSREP: gcomm: terminating thread
mariadb       | 2023-03-17 15:31:52 0 [Note] WSREP: gcomm: joining thread
mariadb       | 2023-03-17 15:31:52 0 [Note] WSREP: gcomm: closing backend
mariadb       | 2023-03-17 15:31:53 0 [Note] WSREP: view(view_id(NON_PRIM,690451a0-91ec,2) memb {
mariadb       |     748a076a-99ef,0
mariadb       | } joined {
mariadb       | } left {
mariadb       | } partitioned {
mariadb       |     690451a0-91ec,0
mariadb       | })
mariadb       | 2023-03-17 15:31:53 0 [Note] WSREP: PC protocol downgrade 1 -> 0
mariadb       | 2023-03-17 15:31:53 0 [Note] WSREP: view((empty))
mariadb       | 2023-03-17 15:31:53 0 [Note] WSREP: gcomm: closed
mariadb       | 2023-03-17 15:31:53 0 [Note] WSREP: mysqld: Terminated.
mariadb       | 230317 15:31:53 [ERROR] mysqld got signal 11 ;
mariadb       | This could be because you hit a bug. It is also possible that this binary
mariadb       | or one of the libraries it was linked against is corrupt, improperly built,
mariadb       | or misconfigured. This error can also be caused by malfunctioning hardware.
mariadb       | 
mariadb       | To report this bug, see https://mariadb.com/kb/en/reporting-bugs
mariadb       | 
mariadb       | We will try our best to scrape up some info that will hopefully help
mariadb       | diagnose the problem, but since we have already crashed, 
mariadb       | something is definitely wrong and this may fail.
mariadb       | 
mariadb       | Server version: 10.4.28-MariaDB-1:10.4.28+maria~ubu2004 source revision: c8f2e9a5c0ac5905f28b050b7df5a9ffd914b7e7
mariadb       | key_buffer_size=0
mariadb       | read_buffer_size=2097152
mariadb       | max_used_connections=0
mariadb       | max_threads=102
mariadb       | thread_count=3
mariadb       | It is possible that mysqld could use up to 
mariadb       | key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 629198 K  bytes of memory
mariadb       | Hope that's ok; if not, decrease some variables in the equation.
mariadb       | 
mariadb       | Thread pointer: 0x0
mariadb       | Attempting backtrace. You can use the following information to find out
mariadb       | where mysqld died. If you see no messages after this, something went
mariadb       | terribly wrong...
mariadb       | stack_bottom = 0x0 thread_stack 0x49000
mariadb       | mysqld(my_print_stacktrace+0x32)[0x562893ae1182]
mariadb       | mysqld(handle_fatal_signal+0x55d)[0x5628935742bd]
mariadb       | /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f92e590d420]
mariadb       | /lib/x86_64-linux-gnu/libc.so.6(abort+0x213)[0x7f92e53f0941]
mariadb       | /usr/lib/galera/libgalera_smm.so(+0x228152)[0x7f92e09c2152]
mariadb       | /usr/lib/galera/libgalera_smm.so(+0xd7be0)[0x7f92e0871be0]
mariadb       | /usr/lib/galera/libgalera_smm.so(+0xd0695)[0x7f92e086a695]
mariadb       | /lib/x86_64-linux-gnu/libpthread.so.0(+0x8609)[0x7f92e5901609]
mariadb       | /lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f92e54ed133]
mariadb       | The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
mariadb       | information that should help you find out what is causing the crash.
mariadb       | Writing a core file...
mariadb       | Working directory at /var/lib/mysql
mariadb       | Resource Limits:
mariadb exited with code 139

Any help will be appreciated. Thx by advance.

grooverdan commented 1 year ago

Can you include the exact reproduction steps?

mrwormo commented 1 year ago

Sure.

  1. Create volumes and network : docker volume create mariadb && docker network create netdb
$ docker network ls
NETWORK ID     NAME               DRIVER    SCOPE
de5df03fb922   bridge             bridge    local
37dbdfd63c64   host               host      local
eef2aedf9116   netdb              bridge    local
  1. galera.cnf is in .conf/ subfolder
  2. Here are galera.cnf and docker-compose.ymlon the first node
    
    [mysqld]
    binlog_format=ROW
    default-storage-engine=innodb
    innodb_autoinc_lock_mode=2
    bind-address=0.0.0.0
    log_slave_updates=ON
    log_bin=galera-bin

Galera Provider Configuration

wsrep_on=ON wsrep_provider=/usr/lib/galera/libgalera_smm.so

Galera Cluster Configuration

wsrep_cluster_name="galera_cluster"

wsrep_cluster_address="gcomm://10.10.1.28,10.10.1.16"

Galera Synchronization Configuration

wsrep_sst_method=rsync

Galera Node Configuration

wsrep_node_address="10.10.1.28" wsrep_node_name="node1"

version: '3.0' services:

mariadb:

image: mariadb:latest

image: mariadb:10.3
container_name: mariadb
hostname: mariadb 
command: --wsrep_cluster_address=gcomm:// 
ports:
  - 3306:3306
  - 4444:4444
  - 4567:4567
  - 4568:4568
environment:
  - MYSQL_ROOT_PASSWORD=${MYSQL_ROOT_PWD}
  - TZ=Europe/Paris
volumes:
  - ./conf:/etc/mysql/conf.d
  - mariadb:/var/lib/mysql
networks:
  - netdb

phpmyadmin: image: phpmyadmin:latest container_name: phpmyadmin hostname: phpmyadmin environment:

networks: netdb: external: true

volumes: mariadb: external: true

5. Here are `galera.cnf` and `docker-compose.yml` on the second node

[mysqld] binlog_format=ROW default-storage-engine=innodb innodb_autoinc_lock_mode=2 bind-address=0.0.0.0 log_slave_updates=ON log_bin=galera-bin

Galera Provider Configuration

wsrep_on=ON wsrep_provider=/usr/lib/galera/libgalera_smm.so

Galera Cluster Configuration

wsrep_cluster_name="galera_cluster"

wsrep_cluster_address="gcomm://10.10.1.28,10.10.1.16"

Galera Synchronization Configuration

wsrep_sst_method=rsync

Galera Node Configuration

wsrep_node_address="10.10.1.16" wsrep_node_name="node2"

version: '3.0' services:

mariadb:

image: mariadb:latest

image: mariadb:10.3
container_name: mariadb
hostname: mariadb 
command: --wsrep_cluster_address=gcomm://10.10.1.28,10.10.1.16
ports:
  - 3306:3306
  - 4444:4444
  - 4567:4567
  - 4568:4568
environment:
  - MYSQL_ROOT_PASSWORD=${MYSQL_ROOT_PWD}
  - TZ=Europe/Paris
volumes:
  - ./conf:/etc/mysql/conf.d
  - mariadb:/var/lib/mysql
#network_mode: host
networks:
  - netdb

phpmyadmin: image: phpmyadmin:latest container_name: phpmyadmin hostname: phpmyadmin environment:

networks: netdb: external: true

volumes: mariadb: external: true


6. Then, `docker-compose up -d` on the first node
7. Once i see the following line in docker logs, i launch docker-compose on the second node : 
`[Note] WSREP: Synchronized with group, ready for connections`
mrwormo commented 1 year ago

I did a test by launching the same stack but on the same server with only one docker-compose : everything works perfectly.

Any idea what could go wrong ?

CrossBound commented 1 year ago

@mrwormo I also had a similar bind error when running under docker. To fix it I had to run docker with the --network host option. Here's where I found the answer: https://mariadb.com/kb/en/ist-replication-failing-on-2-node-galera-mariadb-setup/