MariaDB / mariadb-docker

Docker Official Image packaging for MariaDB
https://mariadb.org
GNU General Public License v2.0
767 stars 438 forks source link

[Question - help needed] My Master - Master setup stop working due to an error I can not solve #561

Closed BobWs closed 2 months ago

BobWs commented 7 months ago

Hi,

My Mariadb master-master replication setup stop working due to an error and I don't know how to solve this problem. If anyone could help this noob;-) to solve this...

Here is the error log of the replications. db-source log:

Slave_IO_State  Waiting for master to send event
Master_Host     192.168.178.100
Master_User     replicator
Master_Port     3307
Connect_Retry   10
Master_Log_File     source10-bin.001177
Read_Master_Log_Pos     15520
Relay_Log_File  replica20-relay-bin.000002
Relay_Log_Pos   690
Relay_Master_Log_File   source10-bin.001173
Slave_IO_Running    Yes
Slave_SQL_Running   No
Replicate_Do_DB     
Replicate_Ignore_DB     
Replicate_Do_Table  
Replicate_Ignore_Table  
Replicate_Wild_Do_Table     
Replicate_Wild_Ignore_Table     
Last_Errno  1396
Last_Error  Error 'Operation DROP USER failed for 'authelia'@'%'' on query. Default database: 'mysql'. Query: 'DROP USER 'authelia'@'%''
Skip_Counter    0
Exec_Master_Log_Pos     4289
Relay_Log_Space     226680
Until_Condition     None
Until_Log_File  
Until_Log_Pos   0
Master_SSL_Allowed  No
Master_SSL_CA_File  
Master_SSL_CA_Path  
Master_SSL_Cert     
Master_SSL_Cipher   
Master_SSL_Key  
Seconds_Behind_Master   

db-replica log:

Slave_IO_State  Waiting for master to send event
Master_Host     192.168.178.110
Master_User     replicator
Master_Port     3307
Connect_Retry   10
Master_Log_File     replica20-bin.000575
Read_Master_Log_Pos     190559
Relay_Log_File  source10-relay-bin.000002
Relay_Log_Pos   692
Relay_Master_Log_File   replica20-bin.000547
Slave_IO_Running    Yes
Slave_SQL_Running   No
Replicate_Do_DB     
Replicate_Ignore_DB     
Replicate_Do_Table  
Replicate_Ignore_Table  
Replicate_Wild_Do_Table     
Replicate_Wild_Ignore_Table     
Last_Errno  1452
Last_Error  Error 'Cannot add or update a child row: a foreign key constraint fails (`hass`.`events`, CONSTRAINT `events_ibfk_1` FOREIGN KEY (`data_id`) REFERENCES `event_data` (`data_id`))' on query. Default database: 'hass'. Query: 'INSERT INTO events (event_type, event_data, origin, origin_idx, time_fired, time_fired_ts, context_id, context_user_id, context_parent_id, data_id, context_id_bin, context_user_id_bin, context_parent_id_bin, event_type_id) VALUES (NULL, NULL, NULL, 1, NULL, 1707944134.049929e0, NULL, NULL, NULL, 4684, '��gE��4�p��}��', 'P�b�k�O��5 �', NULL, 6) RETURNING events.event_id'
Skip_Counter    0
Exec_Master_Log_Pos     87503
Relay_Log_Space     68621741
Until_Condition     None
Until_Log_File  
Until_Log_Pos   0
Master_SSL_Allowed  No
Master_SSL_CA_File  
Master_SSL_CA_Path  
Master_SSL_Cert     
Master_SSL_Cipher   
Master_SSL_Key  
Seconds_Behind_Master   

In the db-source the error is about USER authelia wich I manually deleted via phpMyAdmin but the apparently something went wrong.

The error in db-replica is related to Home-Assistant (hass) and I don't have any clue what that's about.

TIA

grooverdan commented 7 months ago

So db-source - given its receiving replication drop of user authenlia, then phpMyAdmin dropped the user on db-replica (its master?).

To get past this on db-source (missing the drop of a missing user isn't that important):

set global sql_slave_skip_counter=1; start slave;

db-replica:

foreign key constraints. bit hard to say without table structure of hass.events and what events_ibfk_1 is.

The usual scenario is that in a master-master scenario, only ever one node is written to at a time. Replication must be caught up and no more writes on a master before writing to the other master. Make sure this is true somehow, or revert to a master and replica.

pt-table-sync be used to get the two instances in sync again (once replication is going again and you know which copy is current).

BobWs commented 7 months ago

@grooverdan Thank you for replying!

The first problem I manage to solve by skipping the error message and reseting the master (replica) using phpMyAdmin.

This is the log status of db-replica (Master): (Which looks good I think no more errors in phpMyAdmin)

Slave_IO_State  Waiting for master to send event
Master_Host     192.168.178.100
Master_User     replicator
Master_Port     3307
Connect_Retry   10
Master_Log_File     source10-bin.001201
Read_Master_Log_Pos     6236
Relay_Log_File  replica20-relay-bin.000019
Relay_Log_Pos   5641
Relay_Master_Log_File   source10-bin.001201
Slave_IO_Running    Yes
Slave_SQL_Running   Yes
Replicate_Do_DB     
Replicate_Ignore_DB     
Replicate_Do_Table  
Replicate_Ignore_Table  
Replicate_Wild_Do_Table     
Replicate_Wild_Ignore_Table     
Last_Errno  0
Last_Error  
Skip_Counter    0
Exec_Master_Log_Pos     6236
Relay_Log_Space     5500
Until_Condition     None
Until_Log_File  
Until_Log_Pos   0
Master_SSL_Allowed  No
Master_SSL_CA_File  
Master_SSL_CA_Path  
Master_SSL_Cert     
Master_SSL_Cipher   
Master_SSL_Key  
Seconds_Behind_Master   0 

But the second problem with the hass database has not been solved yet.

Here are the steps I have taken so far:

So now I'm getting this error on db-source:

Slave_IO_State  Waiting for master to send event
Master_Host     192.168.178.110
Master_User     replicator
Master_Port     3307
Connect_Retry   10
Master_Log_File     replica20-bin.000587
Read_Master_Log_Pos     872364
Relay_Log_File  source10-relay-bin.000002
Relay_Log_Pos   3301
Relay_Master_Log_File   replica20-bin.000547
Slave_IO_Running    Yes
Slave_SQL_Running   No
Replicate_Do_DB     
Replicate_Ignore_DB     
Replicate_Do_Table  
Replicate_Ignore_Table  
Replicate_Wild_Do_Table     
Replicate_Wild_Ignore_Table     
Last_Errno  1146
Last_Error  Error 'Table 'hass.states' doesn't exist' on query. Default database: 'hass'. Query: 'INSERT INTO states (entity_id, state, attributes, event_id, last_changed, last_changed_ts, last_updated, last_updated_ts, old_state_id, attributes_id, context_id, context_user_id, context_parent_id, origin_idx, context_id_bin, context_user_id_bin, context_parent_id_bin, metadata_id) VALUES (NULL, '6886.437', NULL, NULL, NULL, NULL, NULL, 1707944904.574685e0, 11598857, 26, NULL, NULL, NULL, 0, '��s~(c+� �K', NULL, NULL, 6), (NULL, '0.0403200000000288', NULL, NULL, NULL, NULL, NULL, 1707944904.576475e0, 11598858, 39431, NULL, NULL, NULL, 0, '��s��F`��/�(ۙ', NULL, NULL, 16) RETURNING states.state_id, states.state_id AS state_id__1'
Skip_Counter    0
Exec_Master_Log_Pos     171062
Relay_Log_Space     118689185
Until_Condition     None
Until_Log_File  
Until_Log_Pos   0
Master_SSL_Allowed  No
Master_SSL_CA_File  
Master_SSL_CA_Path  
Master_SSL_Cert     
Master_SSL_Cipher   
Master_SSL_Key  
Seconds_Behind_Master   

I have also tried to skip the error and reset the replica but nothing is working, the error persists and I don't know how to solve this. TIA