dolthub / dolt

Dolt – Git for Data
https://www.dolthub.com
Apache License 2.0
17.99k stars 516 forks source link

Syntax Error with `SOURCE_LOG_FILE` in `CHANGE REPLICATION SOURCE TO` Command #8512

Open TianyuZhang1214 opened 3 weeks ago

TianyuZhang1214 commented 3 weeks ago

Description:
A syntax error occurs when executing a replication query, specifically at position 141 near SOURCE_LOG_FILE. This error prevents the query from executing successfully, resulting in a warning log entry.

Error Details:

WARN[2796] error running query                           
connectTime="2024-10-31 13:54:16.815824 +0800 CST m=+7.077709710" 
connectionDb= connectionID=1 
error="syntax error at position 141 near 'SOURCE_LOG_FILE'" 
query="CHANGE REPLICATION SOURCE TO SOURCE_HOST='127.0.0.1', SOURCE_USER='root', SOURCE_PASSWORD='root', SOURCE_PORT=16345, SOURCE_LOG_FILE='binlog.000047', SOURCE_LOG_POS=157"

Steps to Reproduce:

  1. Execute the following replication command:
    CHANGE REPLICATION SOURCE TO 
    SOURCE_HOST='127.0.0.1', 
    SOURCE_USER='root', 
    SOURCE_PASSWORD='root', 
    SOURCE_PORT=16345, 
    SOURCE_LOG_FILE='binlog.000047', 
    SOURCE_LOG_POS=157;
  2. Observe the syntax error near SOURCE_LOG_FILE.

Expected Behavior:
The replication command should execute without any syntax errors, allowing replication to start as intended.

Actual Behavior:
The command fails with a syntax error, specifically indicating an issue near SOURCE_LOG_FILE.

Additional Context:
It appears that AWS RDS for MySQL and Azure Database for MySQL do not enable GTID by default, making support for SOURCE_LOG_FILE essential for proper replication setup.

fulghum commented 3 weeks ago

Hi @TianyuZhang1214, thank you for opening such a well-written issue. You are correct that MySQL-to-Dolt replication currently only supports GTID positioning. GTID positioning is the more recent/modern positioning system for MySQL replication, and is generally preferred over log file positioning because it tends to be easier to configure and manage.

Adding support for log file based positioning would be possible of course. As a quick ballpark estimate... I'd guess it would probably take around three weeks to fully implement support for it. This is not work we currently have planned, but we're always happy to adapt our roadmap based on customer feedback.

That said, we do have customers using MySQL-to-Dolt replication successfully from cloud hosting providers like AWS and Azure. Have you already looking into enabling GTID support there? Here's some of AWS' documentation on enabling GTIDs with RDS MySQL deployments. Ultimately, I think GTID positioning is a more common and better supported option, and will also enable you to start replicating to Dolt much faster than us implementing log file positioning. If there are specific constraints that prevent you from using GTID positioning, I'd love to learn more. My current understanding is that GTID positioning is the recommended approach and should work in all cases where log file positioning works.

Let us know what you think!

TianyuZhang1214 commented 3 weeks ago

Thank you for your timely response! Here are our thoughts regarding GTID and Binlog file positioning:

Have you already looked into enabling GTID support there?

Yes, GTID can be manually enabled in the MySQL configurations on AWS and Azure, but it is disabled by default. Our goal is to support scenarios where users have not enabled GTID themselves.

We understand that GTID positioning is the recommended approach. Therefore, supporting Binlog file positioning is not an urgent requirement at this time. We can implement it once you provide support for it.

Thank you!

bpf120 commented 3 weeks ago

@TianyuZhang1214 , we'd love to learn more about your Dolt use case. Feel free to email me or join our Discord if you want to share.

TianyuZhang1214 commented 3 weeks ago

@TianyuZhang1214 , we'd love to learn more about your Dolt use case. Feel free to email me or join our Discord if you want to share.

Absolutely! I’d be happy to share more about our project and our use of Dolt. We’re currently working on finalizing the documentation to make our project open-source. Once it’s ready, I’ll share all the details.

VWagen1989 commented 1 week ago

Hi @TianyuZhang1214, thank you for opening such a well-written issue. You are correct that MySQL-to-Dolt replication currently only supports GTID positioning. GTID positioning is the more recent/modern positioning system for MySQL replication, and is generally preferred over log file positioning because it tends to be easier to configure and manage.

Adding support for log file based positioning would be possible of course. As a quick ballpark estimate... I'd guess it would probably take around three weeks to fully implement support for it. This is not work we currently have planned, but we're always happy to adapt our roadmap based on customer feedback.

That said, we do have customers using MySQL-to-Dolt replication successfully from cloud hosting providers like AWS and Azure. Have you already looking into enabling GTID support there? Here's some of AWS' documentation on enabling GTIDs with RDS MySQL deployments. Ultimately, I think GTID positioning is a more common and better supported option, and will also enable you to start replicating to Dolt much faster than us implementing log file positioning. If there are specific constraints that prevent you from using GTID positioning, I'd love to learn more. My current understanding is that GTID positioning is the recommended approach and should work in all cases where log file positioning works.

Let us know what you think!

@fulghum @bpf120 Hello! I was wondering if there are any plans for Dolt to support log file-based positioning replication. I believe this feature would be incredibly beneficial, and I’d love to hear if it’s on the roadmap. Thank you very much for your time and any insights you can share. Looking forward to your response!

fulghum commented 1 week ago

Hi @VWagen1989, we don't currently have plans to support log file-based positioning with MySQL replication. From the customers we've talked to so far, the existing GTID-based positioning support has been preferable. We're always open to learning more about what customers need of course though, so let us know if you have a specific use case where GTID-based positioning wouldn't work.

Adding support for log file-based positioning is a larger chunk of work than a quick bug fix of course, so it's more likely this is something we'd prioritize for a customer with a support contract. Let us know what you think. @bpf120 and I would be happy to learn more about your specific use case if you want to follow up in a separate channel, too.