Defer Index Creation [feature request]?

Roguelazer commented 8 years ago

I'm trying to speed up a gh-ost alter of a gigantic table, and I was wondering if you'd considered deferring index creation until after the main backfill? I imagine it'd go a lot faster if the initial table had no indexes, we did the bulk copy of all of the existing rows, then we paused and did a full ALTER TABLE to create the secondary indexes, then gh-ost replayed the binlog to finish the migration.

I could probably do something like this myself by specifying an alter to gh-ost which drops all indexes, then throttling gh-ost through the control socket, when it gets to 99%, running the ALTER TABLE on the _gho table to add the missing indexes, and un-throttling gh-ost once the ALTER finishes in replication, but that seems failure-prone.

Alternatively, if you have any tips for making rewrites go faster on terabyte-scale tables, I'm all ears. :-)

Roguelazer commented 8 years ago

FWIW, I'm currently testing the manual approach and gh-ost has made more progress on this large table in four hours than it did in two weeks trying to preserve all of the indexes. gh-ost was able to migrate about 10,000 rows per minute with the indexes and about 2,400,000 rows/minute without the indexes.

I figure that I'll just add the indexes separately on the whole cluster in parallel with sql_log_bin=0 with gh-ost throttled.

shlomi-noach commented 8 years ago

Funny you should ask, as I was just asked the same by @morgo. Also thank you very much for providing numbers. The difference is overwhelming! However, how much time did it then take to add the indexes?

The major drawback I see to this approach of creating key-less schema and then adding the indexes, is that it is not online. The alter table add index... statement will be streamlined through replication and will cause replication lags. I know alter ... add key will be faster than a normal alter, but still this is:

blocking replication
exhausting resources in an unknown amount
cannot be throttled
cannot be cancelled (or, rather, you pay a high price for cancelling).

For those reasons my thinking was to not support this feature.

Having said that, 4 hours vs. 2 weeks is overwhelmingly faster. My questions to you are:

How long do adding the indexes take?
How do you deal with the replication lags incurred by this method?

Roguelazer commented 8 years ago

I'm adding the index now to test. I'll let you know how long it takes when it finishes. :-)

Like I said, the easiest way to avoid replication lag is to apply it in parallel simultaneously at all levels of the replication stream with binlogging disabled.

shlomi-noach commented 8 years ago

I figure that I'll just add the indexes separately on the whole cluster in parallel with sql_log_bin=0 with gh-ost throttled.

Oh, right -- that's way beyond gh-ost's scope, it will not evaluate your entire topology (you can use orchestrator for that).

I have quite a few thoughts about this approach, with some experience. TL;DR is that I try to keep away from it; what happens if I restore a host from yesterday's backup? Now it won't have those keys and I don't have the tracking to tell me that. What if I'm just provisioning a new server? What if some server was down/offline/maintenance at that time? Etc. It's fine for a one-time gig, otherwise it requires a reliable management system. Which is totally possible, of course, but you need to have it.

I'm also curious to see what makes it run slow: is it the write capacity on the master? (is disk I/O saturated?) or is it replication lag throttling?

In the first case, did you consider innodb_flush_logs_at_trx_commit=2, sync_binlog=0 on the master? In the latter case, same, on replicas?

Roguelazer commented 8 years ago

In this case, the indexes are unfortunately complicated, wide, and not even slightly in PK order, so inserting into the indexes is super-expensive on the master. No replication delay issues. I tried futzing with various innodb I/O settings (including the two you mentioned and some more esoteric ones from the depths of mysql performance blog like innodb_lru_scan_depth), but nothing had anywhere near the impact of running the alter without the expensive indexes (which is, of course, by no means a new idea; e.g., [1] and [2]).

shlomi-noach commented 8 years ago

Thank you for the info. Curious how long adding the indexes takes. Holding on...

morgo commented 8 years ago

For timing context: is this 5.6 or 5.7? I would expect 5.7 to be much faster because index pages are built from the bottom up without redo

morgo commented 8 years ago

Just a comment w.r.t. cannot be throttled:

Yes this is true. But it is different from regular online DDL, because the incoming modification stream (i.e. the gh-ost portion) can be throttled. So the question is if the background thread which sweeps through the table is too heavy.

The replication issue of course, still stands.

Roguelazer commented 8 years ago

FWIW, I ended up having to kill it because fast index creation takes up too much space for the on-disk temp table it sorts into. Will be re-attempting with more disk hopefully tomorrow. Based on how far it got before I decided to kill it, I'd say that adding the indexes would take about 10 hours, which still compares very favorably to the multi-week single-pass gh-ost run.

Roguelazer commented 8 years ago

I finished running this. I ran gh-ost with all keys removed and paused it when it hit 99.7% (took 7 hours 23 minutes), then added the keys with sql_log_bin=0 in parallel on every host (took 14 hours 2 minutes), then let gh-ost catch up the binlog for those 14 hours and finish the migration (took 2 hours 15 minutes). All told, slightly less than 24 hours for something which had only gotten 70% of the way done after 4 weeks running the normal way.

I know that I could've just used wait-cutover and let the Online DDL change buffer handle the statements gh-ost was writing, but I decided to manually throttle it at 99.7% to ensure that there was no chance of the index addition getting rolled back due to change buffer exhaustion.

shlomi-noach commented 8 years ago

I'm wondering: how long does it take to run a straight online DDL that just alters the table?

Roguelazer commented 8 years ago

Ah, that I don't know. The underlying alter we were doing with gh-ost isn't online.

James Brown, currently mobile

On Oct 6, 2016, at 12:50, Shlomi Noach notifications@github.com wrote:

I'm wondering: how long does it take to run a straight online DDL that just alters the table?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

shlomi-noach commented 8 years ago

@Roguelazer I'm asking because you're already taking the bother to set sql_log_bin=0 then go and execute the alter on all replicas.

This is just the bother gh-ost solves in the first place. So if you do go and execute directly on each replica, why don't you just alter the entire table in the first place, in online DDL?

Roguelazer commented 8 years ago

We were changing the type of a column (INT to BIGINT), which isn't supported by native MySQL online DDL. It was either gh-ost or pt-online-schema-change.

shlomi-noach commented 8 years ago

Thank you for clarifying.

During PerconaLive conference we met with @grypyrg who suggested the following: https://github.com/github/gh-ost/issues/261

Which is also an insanely nice way of optimizing the alter. @Roguelazer I wanted to ask if you also tried migrating directly on replica (thus, not on the master), and if so, how long id it take.

On our side, a replica-only migration can take some 20%-30% of the time it takes on the master, especially when said replica does not take production traffic.

github / gh-ost

Defer Index Creation [feature request]? #250