Closed erikauer closed 5 months ago
Hi,
It is not clear to me that this is due to the Bitnami packaging of MariaDB Galera or an issue in MariaDB Galera itself. Did you check with the upstream mariadb-galera devs? https://jira.mariadb.org/secure/Dashboard.jspa
Hi @javsalgar,
thank you for the fast response. I created a bug on their jira to check, if they have an idea, where this comes from. I will update this ticket, as soon as I have additional information!
Ticket: https://jira.mariadb.org/browse/MDEV-33252
Best, Erik
Thank you!
@erikauer experiencing some similar behavior,
but not with a bitnami installed cluster (Ubuntu 22 vm & dpkg), and a different version (11.0.4-MariaDB
)
otherwise the Waiting for certification
and Waiting to execute in isolation
which result in a hanging mariadb-server that needs to be force-killed -9 or rebooted, seems to match from my perspective.
initially wsrep_slave_threads=1
did temporarily seem to improve, but in the end still came back (so no proper fix yet).
(currently in an evaluation phase of galera cluster that may be impacted by this )
This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.
unstale issue... still relevant and not solved
This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.
ping :)
having this issue
having this issue
This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.
Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.
To reproduce this issue, on a galera node :
create database if not exists ....
while the restore is still running.| 316533 | root | 127.0.0.1:49410 | test | Query | 514 | Waiting for certification | INSERT INTO `test1` VALUES ([...] | 0.000 |
| 316562 | root | 10.0.16.98:44474 | NULL | Query | 513 | Waiting to execute in isolation | CREATE DATABASE IF NOT EXISTS `test` CHARACTER SET = 'utf8mb4' COLLATE = 'utf8mb4_g | 0.000 |
It appends very often when managing galera clusters and databases with mariadb-operator CR. Not managing databases with mariadb-operator CR mitigate this issue.
UPDATE: it was LOCKS.
I got same issue too. Just execute query which add the PRIMARY key to one of tables:
ALTER TABLE traffpro
.clients_traff_mem
ADD PRIMARY KEY (id
)
As the result this transaction stuck and entire server stucks too:
@vixns how did you resolve the locks, I'm currently scratching my head having the same issues while restoring a db cluster, and it fails directly after the first table is being set up
@mathijswesterhof I stripped out the LOCK TABLE statements from the dump before restoring.
Name and Version
bitnami/mariadb-galera 9.0.4
What architecture are you using?
None
What steps will reproduce the bug?
We installed the galera cluster using the bitnami mariadb-helm charts.
After a while (few days) we always see deadlocks in the logs:
Caused by: java.sql.SQLException: (conn=371621) Lock wait timeout exceeded; try restarting transaction
The values.yaml file:
We checked the nodes for there state and we saw that node dev-mariadb-0 has a process in "Waiting for certification" that never ends:
It seems all the other processes waiting for this one to complete.
We have tried to kill this process, but a KILL 370905 just ends up with an additional process waiting for the one to complete.
We are only able to fix this problem by restarting the node again. The other both nodes working as expected and read and write is possible.
Any idea how to fix this problem, or how we are able to ensure that this didn`t happen again?
What is the expected behavior?
All nodes in galera cluster working and not stucking with a process in "waiting for certification".
What do you see instead?
All notes are in sync and not locked.
Additional information
Full Processlist of node0:
I checked all the nodes for the status
Node0:
Node1:
Node2: