bakwc / mysql_ch_replicator

Tool for replication of MySQL databases to ClickHouse
MIT License
18 stars 6 forks source link

Specific SQL version required? #20

Closed giriputraa closed 1 week ago

giriputraa commented 1 week ago

Hey, the project looks interesting but I unfortunately failed to run it. This is the error message I got from the debug logs. Any help appreciated.

Traceback (most recent call last):
  File "/root/.local/share/pipx/venvs/mysql-ch-replicator/lib/python3.12/site-packages/mysql/connector/connection_cext.py", line 706, in cmd_query
    self._cmysql.query(
_mysql_connector.MySQLInterfaceError: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'key > 'None' ORDER BY key LIMIT 50000' at line 1
giriputraa commented 1 week ago

To clarify MySQL Version: 8.0.31-google Clickhouse Version: 24.9.2.42

bakwc commented 1 week ago

Could you please provide more information about table structure, especially a primary key type? Also the full logs will help.

giriputraa commented 1 week ago
[dbrepl db 2024-11-13 07:50:05,000     INFO] running initial replication for table ar_internal_metadata
[dbrepl db 2024-11-13 07:50:05,013     INFO] skip table ar_internal_metadata - not matching any allowed table
[dbrepl db 2024-11-13 07:50:05,013     INFO] continue from primary key None
[dbrepl db 2024-11-13 07:50:05,029    ERROR] unhandled exception

After investigating the logs further I noticed its stuck after trying to sync this specific table. I've decided to remove tables that doesnt have any primary key in it (generated from the framework I'm using for whatever reason).

I'm looking at positive results for now. It's a great project! I'm hoping this can be developed continously.

A few feature suggestions I have:

  1. Allow to rename the destination database (i.e databases: "database_a:production_a")
  2. Exclude tables. Although if you support regex this should be possible already, I haven't tried.
bakwc commented 1 week ago

Yeah, currently primary key is required for replication and for ClickHouse MergeTree engine. Will add explicit check for primary key check. Thanks for feature suggestions, will add them later.

if you support regex this should be possible already

For now it's a simple wildcard only, so only tables to use supported. Will add excluded_tables parameter.