github / gh-ost

GitHub's Online Schema-migration Tool for MySQL
MIT License
12.29k stars 1.25k forks source link

Question: Why Is the "Connect to Replica" mode preferred? #566

Open willfong opened 6 years ago

willfong commented 6 years ago

Hi,

Sorry if there's an obvious answer, but why is the preferred mode to connect to the replica? I couldn't see in the Docs the exact reason for it, but it would be helpful to determine which mode we should use in our environment.

For us, we are able to use both methods. But not all of our systems have the proper configuration for ghost (log_bin, log_slave_updates, binlog_format).

While we could schedule a maintenance window to make those changes, I'm wondering what would we lose out of we just ran from the master.

Thanks! -w

zmoazeni commented 6 years ago

I'll attempt an answer. Though @shlomi-noach no doubt has more experience.

Running gh-ost will stress the server it is run from. It attaches as a faux MySQL replica to the server and that process is what determines whether it should copy/throttle/cut-over etc.

I believe current recommendation is to avoid adding unnecessary stress to a writer since they are harder to scale.

But not all of our systems have the proper configuration for ghost (log_bin, log_slave_updates, binlog_format).

It likely depends on your workload and your server capabilities, but you could always attempt on a writer server. Perhaps add some conservative throttling and be prepared to pause it if things get out of hand.

However, I'm fairly sure at least the writer needs the binary log for gh-ost to work otherwise the gh-ost process can't attach as a replica. So you'll want to verify that.

For the infrastructure I work with, that extra stress isn't crazy high, but I still prefer to run from a replica. I also generally run a replica-only tests first which makes sure it can connect to the writer, but it will only execute the migration on that replica server. So I effectively run the migration twice.

zmoazeni commented 6 years ago

Oh, one more thing I'll add. Running from a replica is also nice because the gh-ost process will inherently respect replication lag.

If you run directly against the writer, the gh-ost process may send updates that the writer can handle, but may slowly lag out replicas. By running from the replica, the gh-ost process will throttle itself as the replica it is connected to begins to lag. That part in particular has been quite nice from my experience. I haven't needed to monitor the lag as heavily because I intentionally run it from one of our weaker servers.

Edit: If gh-ost throttles itself on a weak server, I feel confident that my stronger replicas will be okay during the migration and the entire ecosystem will stay in a healthy state.

tomkrouper commented 6 years ago

@shlomi-noach is on holiday, but if you need more details let me know and I can dig in to give you more details. @zmoazeni got it right though. The idea is to cause less stress on the writer host.

shlomi-noach commented 6 years ago

Yes to all the above.

@zmoazeni, it's worth noting that even if gh-ost runs directly on master, you can still explicitly ask it to respect lag on --throttle-control-replicas.

@willfong thank you for asking! I'm happy to update the docs as needed to clarify. I see we mention the "preferred" method in a few places. Where would you think it would make most sense to elaborate?

wcurrie commented 5 years ago

As a newbie, I had the same thought when I read https://github.com/github/gh-ost/blob/master/doc/cheatsheet.md#a-connect-to-replica-migrate-on-master: Why?

This is the mode gh-ost expects by default.

I first looked in https://github.com/github/gh-ost/blob/master/doc/questions.md. Ended up here via https://github.com/github/gh-ost/issues?utf8=%E2%9C%93&q=is%3Aissue+why+connect+to+replica.

So perhaps a link from cheetsheet.md to questions.md? Or this issue 😄

shlomi-noach commented 5 years ago

I'm grateful if people can suggest a documentation change (PR preferable)