github / gh-ost

GitHub's Online Schema-migration Tool for MySQL
MIT License
12.44k stars 1.26k forks source link

Gh-ost backlog 1000/1000 can't make any progress. #761

Open njfix6 opened 5 years ago

njfix6 commented 5 years ago

Hi I work at Lime and we use gh-ost and like it a lot, but when we are running a migration on a table with a lot of writes, it can not complete. We have ran gh-ost on various tables with success but when there are a high number of writes we can't get it to complete.

The backlog gets stuck at 1000 and nothing gets copied.

We are running gh-ost in a job in Kubernetes.

Running the following:

gh-ost
--max-load=Threads_running=3000,threads_connected=20000
--critical-load=Threads_running=3000
--chunk-size=1000
--max-lag-millis=1500
--allow-on-master
--verbose
--assume-rbr
--allow-master-master
--cut-over=default
--exact-rowcount
--concurrent-rowcount
--default-retries=20
--panic-flag-file=/tmp/ghost.panic.flag
--postpone-cut-over-flag-file=/tmp/ghost.postpone.flag
--user=<user>
--password=<password>
--host=<host>
--database=<database>
--table=bikes
--dml-batch-size=100
--alter='ALTER TABLE bikes ADD COLUMN location point;'
--execute

Output:

Copy: 0/714541 0.0%; Applied: 49550; Backlog: 1000/1000; Time: 1m20s(total), 1m19s(copy); streamer: mysql-bin-changelog.374920:125825751; State: migrating; ETA: N/A
[2019/07/03 19:18:26] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.374921, 4)
[2019/07/03 19:18:26] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.374921, 4)
Copy: 0/714541 0.0%; Applied: 52700; Backlog: 1000/1000; Time: 1m25s(total), 1m24s(copy); streamer: mysql-bin-changelog.374920:133856640; State: migrating; ETA: N/A
2019-07-03 19:18:27 INFO rotate to next log from mysql-bin-changelog.374921:134286986 to mysql-bin-changelog.374921
2019-07-03 19:18:27 INFO rotate to next log from mysql-bin-changelog.374921:0 to mysql-bin-changelog.374921
Copy: 0/714541 0.0%; Applied: 55890; Backlog: 1000/1000; Time: 1m30s(total), 1m29s(copy); streamer: mysql-bin-changelog.374921:7742880; State: migrating; ETA: N/A
Copy: 0/714541 0.0%; Applied: 59080; Backlog: 1000/1000; Time: 1m35s(total), 1m34s(copy); streamer: mysql-bin-changelog.374921:15603076; State: migrating; ETA: N/A
Copy: 0/714541 0.0%; Applied: 62270; Backlog: 1000/1000; Time: 1m40s(total), 1m39s(copy); streamer: mysql-bin-changelog.374921:23465989; State: migrating; ETA: N/A
Copy: 0/714541 0.0%; Applied: 65420; Backlog: 1000/1000; Time: 1m45s(total), 1m44s(copy); streamer: mysql-bin-changelog.374921:31669793; State: migrating; ETA: N/A
Copy: 0/714541 0.0%; Applied: 68570; Backlog: 1000/1000; Time: 1m50s(total), 1m49s(copy); streamer: mysql-bin-changelog.374921:41022300; State: migrating; ETA: N/A
Copy: 0/714541 0.0%; Applied: 71750; Backlog: 1000/1000; Time: 1m55s(total), 1m54s(copy); streamer: mysql-bin-changelog.374921:51120678; State: migrating; ETA: N/A
Copy: 0/714541 0.0%; Applied: 74920; Backlog: 1000/1000; Time: 2m0s(total), 1m59s(copy); streamer: mysql-bin-changelog.374921:58903179; State: migrating; ETA: N/A
Copy: 0/714541 0.0%; Applied: 78100; Backlog: 1000/1000; Time: 2m5s(total), 2m4s(copy); streamer: mysql-bin-changelog.374921:66963586; State: migrating; ETA: N/A
Copy: 0/714541 0.0%; Applied: 81310; Backlog: 1000/1000; Time: 2m10s(total), 2m9s(copy); streamer: mysql-bin-changelog.374921:74901698; State: migrating; ETA: N/A
Copy: 0/714541 0.0%; Applied: 84460; Backlog: 1000/1000; Time: 2m15s(total), 2m14s(copy); streamer: mysql-bin-changelog.374921:82858858; State: migrating; ETA: N/A
Copy: 0/714541 0.0%; Applied: 87590; Backlog: 1000/1000; Time: 2m20s(total), 2m19s(copy); streamer: mysql-bin-changelog.374921:90644007; State: migrating; ETA: N/A
Copy: 0/714541 0.0%; Applied: 90720; Backlog: 1000/1000; Time: 2m25s(total), 2m24s(copy); streamer: mysql-bin-changelog.374921:98630590; State: migrating; ETA: N/A
Copy: 0/714541 0.0%; Applied: 93850; Backlog: 1000/1000; Time: 2m30s(total), 2m29s(copy); streamer: mysql-bin-changelog.374921:106711984; State: migrating; ETA: N/A
Copy: 0/714541 0.0%; Applied: 96990; Backlog: 1000/1000; Time: 2m35s(total), 2m34s(copy); streamer: mysql-bin-changelog.374921:114554349; State: migrating; ETA: N/A
Copy: 0/714541 0.0%; Applied: 100070; Backlog: 1000/1000; Time: 2m40s(total), 2m39s(copy); streamer: mysql-bin-changelog.374921:122465210; State: migrating; ETA: N/A
Copy: 0/714541 0.0%; Applied: 103240; Backlog: 1000/1000; Time: 2m45s(total), 2m44s(copy); streamer: mysql-bin-changelog.374921:130295653; State: migrating; ETA: N/A
[2019/07/03 19:19:48] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.374922, 4)
[2019/07/03 19:19:48] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.374922, 4)
2019-07-03 19:19:49 INFO rotate to next log from mysql-bin-changelog.374922:134270583 to mysql-bin-changelog.374922
2019-07-03 19:19:49 INFO rotate to next log from mysql-bin-changelog.374922:0 to mysql-bin-changelog.374922
Copy: 0/714541 0.0%; Applied: 106380; Backlog: 1000/1000; Time: 2m50s(total), 2m49s(copy); streamer: mysql-bin-changelog.374922:4030170; State: migrating; ETA: N/A
Copy: 0/714541 0.0%; Applied: 109480; Backlog: 1000/1000; Time: 2m55s(total), 2m54s(copy); streamer: mysql-bin-changelog.374922:11866933; State: migrating; ETA: N/A
Copy: 0/714541 0.0%; Applied: 112620; Backlog: 1000/1000; Time: 3m0s(total), 2m59s(copy); streamer: mysql-bin-changelog.374922:19908039; State: migrating; ETA: N/A
Copy: 0/714541 0.0%; Applied: 131450; Backlog: 1000/1000; Time: 3m30s(total), 3m29s(copy); streamer: mysql-bin-changelog.374922:68951154; State: migrating; ETA: N/A
Copy: 0/714541 0.0%; Applied: 150120; Backlog: 1000/1000; Time: 4m0s(total), 3m59s(copy); streamer: mysql-bin-changelog.374922:127018597; State: migrating; ETA: N/A
[2019/07/03 19:21:05] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.374923, 4)
[2019/07/03 19:21:05] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.374923, 4)
2019-07-03 19:21:06 INFO rotate to next log from mysql-bin-changelog.374923:134291901 to mysql-bin-changelog.374923
2019-07-03 19:21:06 INFO rotate to next log from mysql-bin-changelog.374923:0 to mysql-bin-changelog.374923
Copy: 0/714541 0.0%; Applied: 169190; Backlog: 993/1000; Time: 4m30s(total), 4m29s(copy); streamer: mysql-bin-changelog.374923:53001257; State: migrating; ETA: N/A
Copy: 0/714541 0.0%; Applied: 188340; Backlog: 993/1000; Time: 5m0s(total), 4m59s(copy); streamer: mysql-bin-changelog.374923:115111365; State: migrating; ETA: N/A
[2019/07/03 19:22:13] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.374924, 4)
[2019/07/03 19:22:13] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.374924, 4)
2019-07-03 19:22:14 INFO rotate to next log from mysql-bin-changelog.374924:134307761 to mysql-bin-changelog.374924
2019-07-03 19:22:14 INFO rotate to next log from mysql-bin-changelog.374924:0 to mysql-bin-changelog.374924
Copy: 0/714541 0.0%; Applied: 207380; Backlog: 1000/1000; Time: 5m30s(total), 5m29s(copy); streamer: mysql-bin-changelog.374924:42299534; State: migrating; ETA: N/A
[2019/07/03 19:22:44] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.374925, 4)
[2019/07/03 19:22:44] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.374925, 4)
2019-07-03 19:22:45 INFO rotate to next log from mysql-bin-changelog.374925:64729630 to mysql-bin-changelog.374925
2019-07-03 19:22:45 INFO rotate to next log from mysql-bin-changelog.374925:0 to mysql-bin-changelog.374925
Copy: 0/714541 0.0%; Applied: 226520; Backlog: 1000/1000; Time: 6m0s(total), 5m59s(copy); streamer: mysql-bin-changelog.374925:27583532; State: migrating; ETA: N/A
Copy: 0/714541 0.0%; Applied: 245510; Backlog: 1000/1000; Time: 6m30s(total), 6m29s(copy); streamer: mysql-bin-changelog.374925:78625914; State: migrating; ETA: N/A
Copy: 0/714541 0.0%; Applied: 264500; Backlog: 1000/1000; Time: 7m0s(total), 6m59s(copy); streamer: mysql-bin-changelog.374925:130436948; State: migrating; ETA: N/A
[2019/07/03 19:24:03] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.374926, 4)
[2019/07/03 19:24:03] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.374926, 4)

Gh-ost version: 1.0.48

SQL: ALTER TABLE bikes ALTER TABLE bikes ADD COLUMN location point;

njfix6 commented 5 years ago

Another note. I tried to run it again and it gained some progress but then started to slow again and stop

This output started to show up much more frequently:

[2019/07/03 21:03:32] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.375086, 4)
[2019/07/03 21:03:32] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.375086, 4)
2019-07-03 21:03:32 INFO rotate to next log from mysql-bin-changelog.375086:134332143 to mysql-bin-changelog.375086
2019-07-03 21:03:32 INFO rotate to next log from mysql-bin-changelog.375086:0 to mysql-bin-changelog.375086

full logs:

Copy: 0/667509 0.0%; Applied: 0; Backlog: 36/1000; Time: 0s(total), 0s(copy); streamer: mysql-bin-changelog.375085:25876303; State: migrating; ETA: N/A
2019-07-03 21:03:05 INFO Exact number of rows via COUNT: 714541
Copy: 2000/714541 0.3%; Applied: 1225; Backlog: 42/1000; Time: 1s(total), 1s(copy); streamer: mysql-bin-changelog.375085:28951849; State: migrating; ETA: N/A
Copy: 10000/714541 1.4%; Applied: 2494; Backlog: 97/1000; Time: 2s(total), 2s(copy); streamer: mysql-bin-changelog.375085:33042841; State: migrating; ETA: 2m21s
Copy: 19000/714541 2.7%; Applied: 3781; Backlog: 98/1000; Time: 3s(total), 3s(copy); streamer: mysql-bin-changelog.375085:37242262; State: migrating; ETA: 1m50s
Copy: 27000/714541 3.8%; Applied: 5080; Backlog: 33/1000; Time: 4s(total), 4s(copy); streamer: mysql-bin-changelog.375085:41188592; State: migrating; ETA: 1m42s
Copy: 35000/714541 4.9%; Applied: 6295; Backlog: 13/1000; Time: 5s(total), 5s(copy); streamer: mysql-bin-changelog.375085:45090711; State: migrating; ETA: 1m37s
Copy: 42000/714541 5.9%; Applied: 7407; Backlog: 56/1000; Time: 6s(total), 6s(copy); streamer: mysql-bin-changelog.375085:48714049; State: migrating; ETA: 1m36s
Copy: 49000/714541 6.9%; Applied: 8622; Backlog: 89/1000; Time: 7s(total), 7s(copy); streamer: mysql-bin-changelog.375085:52532373; State: migrating; ETA: 1m35s
Copy: 58000/714541 8.1%; Applied: 9929; Backlog: 41/1000; Time: 8s(total), 8s(copy); streamer: mysql-bin-changelog.375085:56798973; State: migrating; ETA: 1m30s
Copy: 65000/714541 9.1%; Applied: 11158; Backlog: 48/1000; Time: 9s(total), 9s(copy); streamer: mysql-bin-changelog.375085:60594575; State: migrating; ETA: 1m30s
Copy: 73000/714541 10.2%; Applied: 12395; Backlog: 110/1000; Time: 10s(total), 10s(copy); streamer: mysql-bin-changelog.375085:64614710; State: migrating; ETA: 1m27s
Copy: 81000/714541 11.3%; Applied: 13833; Backlog: 19/1000; Time: 11s(total), 11s(copy); streamer: mysql-bin-changelog.375085:68645257; State: migrating; ETA: 1m26s
Copy: 88000/714541 12.3%; Applied: 15241; Backlog: 12/1000; Time: 12s(total), 12s(copy); streamer: mysql-bin-changelog.375085:72651769; State: migrating; ETA: 1m25s
Copy: 94000/714541 13.2%; Applied: 16543; Backlog: 16/1000; Time: 13s(total), 13s(copy); streamer: mysql-bin-changelog.375085:76343752; State: migrating; ETA: 1m25s
Copy: 101000/714541 14.1%; Applied: 17929; Backlog: 83/1000; Time: 14s(total), 14s(copy); streamer: mysql-bin-changelog.375085:80523058; State: migrating; ETA: 1m25s
Copy: 107000/714541 15.0%; Applied: 19332; Backlog: 71/1000; Time: 15s(total), 15s(copy); streamer: mysql-bin-changelog.375085:84513950; State: migrating; ETA: 1m25s
Copy: 113000/714541 15.8%; Applied: 20653; Backlog: 66/1000; Time: 16s(total), 16s(copy); streamer: mysql-bin-changelog.375085:88412281; State: migrating; ETA: 1m25s
Copy: 119000/714541 16.7%; Applied: 21999; Backlog: 109/1000; Time: 17s(total), 17s(copy); streamer: mysql-bin-changelog.375085:92513059; State: migrating; ETA: 1m25s
Copy: 125000/714541 17.5%; Applied: 23335; Backlog: 106/1000; Time: 18s(total), 18s(copy); streamer: mysql-bin-changelog.375085:96537961; State: migrating; ETA: 1m24s
Copy: 131000/714541 18.3%; Applied: 24752; Backlog: 37/1000; Time: 19s(total), 19s(copy); streamer: mysql-bin-changelog.375085:100453243; State: migrating; ETA: 1m24s
Copy: 137000/714541 19.2%; Applied: 26082; Backlog: 64/1000; Time: 20s(total), 20s(copy); streamer: mysql-bin-changelog.375085:104439958; State: migrating; ETA: 1m24s
Copy: 143000/714541 20.0%; Applied: 27423; Backlog: 43/1000; Time: 21s(total), 21s(copy); streamer: mysql-bin-changelog.375085:108480702; State: migrating; ETA: 1m23s
Copy: 148000/714541 20.7%; Applied: 28709; Backlog: 71/1000; Time: 22s(total), 22s(copy); streamer: mysql-bin-changelog.375085:112226512; State: migrating; ETA: 1m24s
Copy: 153000/714541 21.4%; Applied: 30007; Backlog: 79/1000; Time: 23s(total), 23s(copy); streamer: mysql-bin-changelog.375085:116048503; State: migrating; ETA: 1m24s
Copy: 157000/714541 22.0%; Applied: 31159; Backlog: 113/1000; Time: 24s(total), 24s(copy); streamer: mysql-bin-changelog.375085:119374733; State: migrating; ETA: 1m25s
Copy: 162000/714541 22.7%; Applied: 32603; Backlog: 77/1000; Time: 25s(total), 25s(copy); streamer: mysql-bin-changelog.375085:123339704; State: migrating; ETA: 1m25s
Copy: 168000/714541 23.5%; Applied: 33931; Backlog: 94/1000; Time: 26s(total), 26s(copy); streamer: mysql-bin-changelog.375085:127359465; State: migrating; ETA: 1m24s
Copy: 172000/714541 24.1%; Applied: 35312; Backlog: 37/1000; Time: 27s(total), 27s(copy); streamer: mysql-bin-changelog.375085:131030115; State: migrating; ETA: 1m25s
[2019/07/03 21:03:32] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.375086, 4)
[2019/07/03 21:03:32] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.375086, 4)
2019-07-03 21:03:32 INFO rotate to next log from mysql-bin-changelog.375086:134332143 to mysql-bin-changelog.375086
2019-07-03 21:03:32 INFO rotate to next log from mysql-bin-changelog.375086:0 to mysql-bin-changelog.375086
Copy: 176000/714541 24.6%; Applied: 36665; Backlog: 3/1000; Time: 28s(total), 28s(copy); streamer: mysql-bin-changelog.375086:259494; State: migrating; ETA: 1m25s
Copy: 179000/714541 25.1%; Applied: 37968; Backlog: 90/1000; Time: 29s(total), 29s(copy); streamer: mysql-bin-changelog.375086:3754180; State: migrating; ETA: 1m26s
Copy: 182000/714541 25.5%; Applied: 39365; Backlog: 14/1000; Time: 30s(total), 30s(copy); streamer: mysql-bin-changelog.375086:7180175; State: migrating; ETA: 1m27s
Copy: 184000/714541 25.8%; Applied: 40787; Backlog: 46/1000; Time: 31s(total), 31s(copy); streamer: mysql-bin-changelog.375086:10665474; State: migrating; ETA: 1m29s
Copy: 188000/714541 26.3%; Applied: 42141; Backlog: 121/1000; Time: 32s(total), 32s(copy); streamer: mysql-bin-changelog.375086:14459404; State: migrating; ETA: 1m29s
Copy: 192000/714541 26.9%; Applied: 43631; Backlog: 42/1000; Time: 33s(total), 33s(copy); streamer: mysql-bin-changelog.375086:18122453; State: migrating; ETA: 1m29s
Copy: 195000/714541 27.3%; Applied: 44954; Backlog: 119/1000; Time: 34s(total), 34s(copy); streamer: mysql-bin-changelog.375086:21649026; State: migrating; ETA: 1m30s
Copy: 196000/714541 27.4%; Applied: 46244; Backlog: 127/1000; Time: 35s(total), 35s(copy); streamer: mysql-bin-changelog.375086:24811380; State: migrating; ETA: 1m32s
Copy: 199000/714541 27.9%; Applied: 47638; Backlog: 38/1000; Time: 36s(total), 36s(copy); streamer: mysql-bin-changelog.375086:28135765; State: migrating; ETA: 1m33s
Copy: 201000/714541 28.1%; Applied: 49037; Backlog: 72/1000; Time: 37s(total), 37s(copy); streamer: mysql-bin-changelog.375086:31712622; State: migrating; ETA: 1m34s
Copy: 203000/714541 28.4%; Applied: 50464; Backlog: 11/1000; Time: 38s(total), 38s(copy); streamer: mysql-bin-changelog.375086:35261620; State: migrating; ETA: 1m35s
Copy: 204000/714541 28.5%; Applied: 51734; Backlog: 49/1000; Time: 39s(total), 39s(copy); streamer: mysql-bin-changelog.375086:38417438; State: migrating; ETA: 1m37s
Copy: 205000/714541 28.7%; Applied: 52972; Backlog: 170/1000; Time: 40s(total), 40s(copy); streamer: mysql-bin-changelog.375086:41700191; State: migrating; ETA: 1m39s
Copy: 205000/714541 28.7%; Applied: 54392; Backlog: 10/1000; Time: 41s(total), 41s(copy); streamer: mysql-bin-changelog.375086:44712513; State: migrating; ETA: 1m41s
Copy: 207000/714541 29.0%; Applied: 55702; Backlog: 11/1000; Time: 42s(total), 42s(copy); streamer: mysql-bin-changelog.375086:48008711; State: migrating; ETA: 1m43s
Copy: 208000/714541 29.1%; Applied: 57063; Backlog: 29/1000; Time: 43s(total), 43s(copy); streamer: mysql-bin-changelog.375086:51336711; State: migrating; ETA: 1m44s
Copy: 210000/714541 29.4%; Applied: 58398; Backlog: 51/1000; Time: 44s(total), 44s(copy); streamer: mysql-bin-changelog.375086:54713486; State: migrating; ETA: 1m45s
Copy: 211000/714541 29.5%; Applied: 59695; Backlog: 175/1000; Time: 45s(total), 45s(copy); streamer: mysql-bin-changelog.375086:58018273; State: migrating; ETA: 1m47s
Copy: 212000/714541 29.7%; Applied: 61009; Backlog: 119/1000; Time: 46s(total), 46s(copy); streamer: mysql-bin-changelog.375086:61061761; State: migrating; ETA: 1m49s
Copy: 214000/714541 29.9%; Applied: 62422; Backlog: 32/1000; Time: 47s(total), 47s(copy); streamer: mysql-bin-changelog.375086:64396606; State: migrating; ETA: 1m49s
Copy: 215000/714541 30.1%; Applied: 63620; Backlog: 91/1000; Time: 48s(total), 48s(copy); streamer: mysql-bin-changelog.375086:67396915; State: migrating; ETA: 1m51s
Copy: 217000/714541 30.4%; Applied: 64925; Backlog: 86/1000; Time: 49s(total), 49s(copy); streamer: mysql-bin-changelog.375086:70623169; State: migrating; ETA: 1m52s
Copy: 218000/714541 30.5%; Applied: 66061; Backlog: 192/1000; Time: 50s(total), 50s(copy); streamer: mysql-bin-changelog.375086:73659412; State: migrating; ETA: 1m53s
Copy: 218000/714541 30.5%; Applied: 67611; Backlog: 18/1000; Time: 51s(total), 51s(copy); streamer: mysql-bin-changelog.375086:76812134; State: migrating; ETA: 1m56s
Copy: 219000/714541 30.6%; Applied: 68673; Backlog: 260/1000; Time: 52s(total), 52s(copy); streamer: mysql-bin-changelog.375086:79936656; State: migrating; ETA: 1m57s
Copy: 219000/714541 30.6%; Applied: 70033; Backlog: 190/1000; Time: 53s(total), 53s(copy); streamer: mysql-bin-changelog.375086:83171392; State: migrating; ETA: 1m59s
Copy: 221000/714541 30.9%; Applied: 71444; Backlog: 73/1000; Time: 54s(total), 54s(copy); streamer: mysql-bin-changelog.375086:86483117; State: migrating; ETA: 2m0s
Copy: 222000/714541 31.1%; Applied: 72675; Backlog: 127/1000; Time: 55s(total), 55s(copy); streamer: mysql-bin-changelog.375086:89643362; State: migrating; ETA: 2m2s
Copy: 223000/714541 31.2%; Applied: 74009; Backlog: 103/1000; Time: 56s(total), 56s(copy); streamer: mysql-bin-changelog.375086:93022218; State: migrating; ETA: 2m3s
Copy: 223000/714541 31.2%; Applied: 75369; Backlog: 53/1000; Time: 57s(total), 57s(copy); streamer: mysql-bin-changelog.375086:96231448; State: migrating; ETA: 2m5s
Copy: 225000/714541 31.5%; Applied: 76596; Backlog: 98/1000; Time: 58s(total), 58s(copy); streamer: mysql-bin-changelog.375086:99631480; State: migrating; ETA: 2m6s
Copy: 226000/714541 31.6%; Applied: 77872; Backlog: 63/1000; Time: 59s(total), 59s(copy); streamer: mysql-bin-changelog.375086:102673096; State: migrating; ETA: 2m7s
Copy: 227000/714541 31.8%; Applied: 79056; Backlog: 95/1000; Time: 1m0s(total), 1m0s(copy); streamer: mysql-bin-changelog.375086:105723930; State: migrating; ETA: 2m8s
Copy: 227000/714541 31.8%; Applied: 84506; Backlog: 801/1000; Time: 1m5s(total), 1m5s(copy); streamer: mysql-bin-changelog.375086:120460881; State: migrating; ETA: 2m19s
[2019/07/03 21:04:14] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.375087, 4)
[2019/07/03 21:04:14] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.375087, 4)
2019-07-03 21:04:14 INFO rotate to next log from mysql-bin-changelog.375087:134310268 to mysql-bin-changelog.375087
2019-07-03 21:04:14 INFO rotate to next log from mysql-bin-changelog.375087:0 to mysql-bin-changelog.375087
Copy: 227000/714541 31.8%; Applied: 90376; Backlog: 1000/1000; Time: 1m10s(total), 1m10s(copy); streamer: mysql-bin-changelog.375087:1210687; State: migrating; ETA: 2m30s
Copy: 227000/714541 31.8%; Applied: 94366; Backlog: 1000/1000; Time: 1m15s(total), 1m15s(copy); streamer: mysql-bin-changelog.375087:10979492; State: migrating; ETA: 2m41s
Copy: 227000/714541 31.8%; Applied: 100646; Backlog: 1000/1000; Time: 1m20s(total), 1m20s(copy); streamer: mysql-bin-changelog.375087:31451998; State: migrating; ETA: 2m51s
Copy: 227000/714541 31.8%; Applied: 107096; Backlog: 1000/1000; Time: 1m25s(total), 1m25s(copy); streamer: mysql-bin-changelog.375087:49614542; State: migrating; ETA: 3m2s
Copy: 227000/714541 31.8%; Applied: 113616; Backlog: 1000/1000; Time: 1m30s(total), 1m30s(copy); streamer: mysql-bin-changelog.375087:70019573; State: migrating; ETA: 3m13s
Copy: 227000/714541 31.8%; Applied: 119966; Backlog: 1000/1000; Time: 1m35s(total), 1m35s(copy); streamer: mysql-bin-changelog.375087:88695103; State: migrating; ETA: 3m24s
Copy: 227000/714541 31.8%; Applied: 126206; Backlog: 1000/1000; Time: 1m40s(total), 1m40s(copy); streamer: mysql-bin-changelog.375087:106009471; State: migrating; ETA: 3m34s
Copy: 227000/714541 31.8%; Applied: 132236; Backlog: 1000/1000; Time: 1m45s(total), 1m45s(copy); streamer: mysql-bin-changelog.375087:120428401; State: migrating; ETA: 3m45s
[2019/07/03 21:04:51] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.375088, 4)
[2019/07/03 21:04:51] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.375088, 4)
2019-07-03 21:04:51 INFO rotate to next log from mysql-bin-changelog.375088:134266653 to mysql-bin-changelog.375088
2019-07-03 21:04:51 INFO rotate to next log from mysql-bin-changelog.375088:0 to mysql-bin-changelog.375088
Copy: 227000/714541 31.8%; Applied: 138606; Backlog: 1000/1000; Time: 1m50s(total), 1m50s(copy); streamer: mysql-bin-changelog.375088:9657218; State: migrating; ETA: 3m56s
Copy: 227000/714541 31.8%; Applied: 144996; Backlog: 998/1000; Time: 1m55s(total), 1m55s(copy); streamer: mysql-bin-changelog.375088:27579728; State: migrating; ETA: 4m7s
[2019/07/03 21:05:03] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.375089, 4)
[2019/07/03 21:05:03] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.375089, 4)
2019-07-03 21:05:03 INFO rotate to next log from mysql-bin-changelog.375089:42000112 to mysql-bin-changelog.375089
2019-07-03 21:05:03 INFO rotate to next log from mysql-bin-changelog.375089:0 to mysql-bin-changelog.375089
Copy: 227000/714541 31.8%; Applied: 151356; Backlog: 998/1000; Time: 2m0s(total), 2m0s(copy); streamer: mysql-bin-changelog.375089:3595157; State: migrating; ETA: 4m17s
Copy: 227000/714541 31.8%; Applied: 157776; Backlog: 1000/1000; Time: 2m5s(total), 2m5s(copy); streamer: mysql-bin-changelog.375089:31554674; State: migrating; ETA: 4m28s
Copy: 227000/714541 31.8%; Applied: 164226; Backlog: 1000/1000; Time: 2m10s(total), 2m10s(copy); streamer: mysql-bin-changelog.375089:48072153; State: migrating; ETA: 4m39s
Copy: 227000/714541 31.8%; Applied: 170166; Backlog: 1000/1000; Time: 2m15s(total), 2m15s(copy); streamer: mysql-bin-changelog.375089:62161460; State: migrating; ETA: 4m49s
Copy: 227000/714541 31.8%; Applied: 176246; Backlog: 1000/1000; Time: 2m20s(total), 2m20s(copy); streamer: mysql-bin-changelog.375089:80007257; State: migrating; ETA: 5m0s
Copy: 227000/714541 31.8%; Applied: 182386; Backlog: 1000/1000; Time: 2m25s(total), 2m25s(copy); streamer: mysql-bin-changelog.375089:96285305; State: migrating; ETA: 5m11s
Copy: 227000/714541 31.8%; Applied: 189386; Backlog: 1000/1000; Time: 2m30s(total), 2m30s(copy); streamer: mysql-bin-changelog.375089:113232449; State: migrating; ETA: 5m22s
Copy: 227000/714541 31.8%; Applied: 195916; Backlog: 1000/1000; Time: 2m35s(total), 2m35s(copy); streamer: mysql-bin-changelog.375089:131137524; State: migrating; ETA: 5m32s
[2019/07/03 21:05:40] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.375090, 4)
[2019/07/03 21:05:40] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.375090, 4)
2019-07-03 21:05:41 INFO rotate to next log from mysql-bin-changelog.375090:134572844 to mysql-bin-changelog.375090
2019-07-03 21:05:41 INFO rotate to next log from mysql-bin-changelog.375090:0 to mysql-bin-changelog.375090
Copy: 227000/714541 31.8%; Applied: 202166; Backlog: 1000/1000; Time: 2m40s(total), 2m40s(copy); streamer: mysql-bin-changelog.375090:12911947; State: migrating; ETA: 5m43s
Copy: 227000/714541 31.8%; Applied: 208236; Backlog: 1000/1000; Time: 2m45s(total), 2m45s(copy); streamer: mysql-bin-changelog.375090:28148498; State: migrating; ETA: 5m54s
Copy: 227000/714541 31.8%; Applied: 215206; Backlog: 1000/1000; Time: 2m50s(total), 2m50s(copy); streamer: mysql-bin-changelog.375090:58105246; State: migrating; ETA: 6m5s
Copy: 227000/714541 31.8%; Applied: 221616; Backlog: 1000/1000; Time: 2m55s(total), 2m55s(copy); streamer: mysql-bin-changelog.375090:73460457; State: migrating; ETA: 6m15s
Copy: 227000/714541 31.8%; Applied: 227716; Backlog: 1000/1000; Time: 3m0s(total), 3m0s(copy); streamer: mysql-bin-changelog.375090:91581528; State: migrating; ETA: 6m26s
[2019/07/03 21:06:17] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.375091, 4)
[2019/07/03 21:06:17] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.375091, 4)
2019-07-03 21:06:18 INFO rotate to next log from mysql-bin-changelog.375091:134305551 to mysql-bin-changelog.375091
2019-07-03 21:06:18 INFO rotate to next log from mysql-bin-changelog.375091:0 to mysql-bin-changelog.375091
Copy: 227000/714541 31.8%; Applied: 264406; Backlog: 1000/1000; Time: 3m30s(total), 3m30s(copy); streamer: mysql-bin-changelog.375091:49260919; State: migrating; ETA: 7m31s
[2019/07/03 21:07:00] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.375092, 4)
[2019/07/03 21:07:00] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.375092, 4)
2019-07-03 21:07:01 INFO rotate to next log from mysql-bin-changelog.375092:134261693 to mysql-bin-changelog.375092
2019-07-03 21:07:01 INFO rotate to next log from mysql-bin-changelog.375092:0 to mysql-bin-changelog.375092
Copy: 227000/714541 31.8%; Applied: 301546; Backlog: 1000/1000; Time: 4m0s(total), 4m0s(copy); streamer: mysql-bin-changelog.375092:12047186; State: migrating; ETA: 8m35s
# Migrating `ebdb`.`bikes`; Ghost table is `ebdb`.`_bikes_gho`
# Migrating ip-10-3-2-76:3306; inspecting ip-10-3-2-76:3306; executing on gh-ost-migration-bpd9d
# Migration started at Wed Jul 03 21:03:04 +0000 2019
# chunk-size: 1000; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: Threads_running=500,threads_connected=15000; critical-load: Threads_running=1000; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle
# throttle-control-replicas count: 5
# postpone-cut-over-flag-file: /tmp/ghost.postpone.flag [set]
# panic-flag-file: /tmp/ghost.panic.flag
# Serving on unix socket: /tmp/gh-ost.ebdb.bikes.sock
Copy: 227000/714541 31.8%; Applied: 335836; Backlog: 1000/1000; Time: 4m28s(total), 4m27s(copy); streamer: mysql-bin-changelog.375092:100347963; State: migrating; ETA: 9m34s
Copy: 227000/714541 31.8%; Applied: 338536; Backlog: 1000/1000; Time: 4m30s(total), 4m30s(copy); streamer: mysql-bin-changelog.375092:107395090; State: migrating; ETA: 9m39s
[2019/07/03 21:07:42] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.375093, 4)
[2019/07/03 21:07:42] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.375093, 4)
2019-07-03 21:07:43 INFO rotate to next log from mysql-bin-changelog.375093:134253588 to mysql-bin-changelog.375093
2019-07-03 21:07:43 INFO rotate to next log from mysql-bin-changelog.375093:0 to mysql-bin-changelog.375093
Copy: 227000/714541 31.8%; Applied: 375976; Backlog: 1000/1000; Time: 5m0s(total), 5m0s(copy); streamer: mysql-bin-changelog.375093:69400110; State: migrating; ETA: 10m44s
[2019/07/03 21:08:27] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.375094, 4)
[2019/07/03 21:08:27] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.375094, 4)
2019-07-03 21:08:28 INFO rotate to next log from mysql-bin-changelog.375094:134280669 to mysql-bin-changelog.375094
2019-07-03 21:08:28 INFO rotate to next log from mysql-bin-changelog.375094:0 to mysql-bin-changelog.375094
Copy: 227000/714541 31.8%; Applied: 408966; Backlog: 1000/1000; Time: 5m30s(total), 5m30s(copy); streamer: mysql-bin-changelog.375094:10649198; State: migrating; ETA: 11m48s
Copy: 227000/714541 31.8%; Applied: 443936; Backlog: 1000/1000; Time: 6m0s(total), 6m0s(copy); streamer: mysql-bin-changelog.375094:82296850; State: migrating; ETA: 12m53s
[2019/07/03 21:09:20] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.375095, 4)
[2019/07/03 21:09:20] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.375095, 4)
2019-07-03 21:09:20 INFO rotate to next log from mysql-bin-changelog.375095:134301839 to mysql-bin-changelog.375095
2019-07-03 21:09:20 INFO rotate to next log from mysql-bin-changelog.375095:0 to mysql-bin-changelog.375095
Copy: 227000/714541 31.8%; Applied: 482446; Backlog: 994/1000; Time: 6m30s(total), 6m30s(copy); streamer: mysql-bin-changelog.375095:43583722; State: migrating; ETA: 13m57s
[2019/07/03 21:10:04] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.375096, 4)
[2019/07/03 21:10:04] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.375096, 4)
Copy: 227000/714541 31.8%; Applied: 519536; Backlog: 991/1000; Time: 7m0s(total), 7m0s(copy); streamer: mysql-bin-changelog.375095:133858797; State: migrating; ETA: 15m2s
2019-07-03 21:10:05 INFO rotate to next log from mysql-bin-changelog.375096:134250644 to mysql-bin-changelog.375096
2019-07-03 21:10:05 INFO rotate to next log from mysql-bin-changelog.375096:0 to mysql-bin-changelog.375096
[2019/07/03 21:10:27] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.375097, 4)
[2019/07/03 21:10:27] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.375097, 4)
2019-07-03 21:10:27 INFO rotate to next log from mysql-bin-changelog.375097:64842663 to mysql-bin-changelog.375097
2019-07-03 21:10:27 INFO rotate to next log from mysql-bin-changelog.375097:0 to mysql-bin-changelog.375097
Copy: 227000/714541 31.8%; Applied: 557666; Backlog: 991/1000; Time: 7m30s(total), 7m30s(copy); streamer: mysql-bin-changelog.375097:21931102; State: migrating; ETA: 16m6s
Copy: 227000/714541 31.8%; Applied: 596056; Backlog: 1000/1000; Time: 8m0s(total), 8m0s(copy); streamer: mysql-bin-changelog.375097:111538088; State: migrating; ETA: 17m10s
[2019/07/03 21:11:13] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.375098, 4)
[2019/07/03 21:11:13] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.375098, 4)
2019-07-03 21:11:13 INFO rotate to next log from mysql-bin-changelog.375098:134270111 to mysql-bin-changelog.375098
2019-07-03 21:11:13 INFO rotate to next log from mysql-bin-changelog.375098:0 to mysql-bin-changelog.375098
Copy: 227000/714541 31.8%; Applied: 632306; Backlog: 999/1000; Time: 8m30s(total), 8m30s(copy); streamer: mysql-bin-changelog.375098:58184433; State: migrating; ETA: 18m15s
[2019/07/03 21:11:57] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.375099, 4)
[2019/07/03 21:11:57] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.375099, 4)
2019-07-03 21:11:58 INFO rotate to next log from mysql-bin-changelog.375099:134267224 to mysql-bin-changelog.375099
2019-07-03 21:11:58 INFO rotate to next log from mysql-bin-changelog.375099:0 to mysql-bin-changelog.375099
Copy: 227000/714541 31.8%; Applied: 668936; Backlog: 1000/1000; Time: 9m0s(total), 9m0s(copy); streamer: mysql-bin-changelog.375099:20111737; State: migrating; ETA: 19m19s
Copy: 227000/714541 31.8%; Applied: 706296; Backlog: 1000/1000; Time: 9m30s(total), 9m30s(copy); streamer: mysql-bin-changelog.375099:103713960; State: migrating; ETA: 20m24s
[2019/07/03 21:12:45] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.375100, 4)
[2019/07/03 21:12:45] [info] binlogsyncer.go:723 rotate to (mysql-bin-changelog.375100, 4)
2019-07-03 21:12:45 INFO rotate to next log from mysql-bin-changelog.375100:134289335 to mysql-bin-changelog.375100
2019-07-03 21:12:45 INFO rotate to next log from mysql-bin-changelog.375100:0 to mysql-bin-changelog.375100
# Migrating `ebdb`.`bikes`; Ghost table is `ebdb`.`_bikes_gho`
shlomi-noach commented 5 years ago

@njfix6 since the table is so busy, my advice is to get gh-ost to run closer to the hosts. I see you run with --allow-on-master -- are you connecting gh-ost directly to the master? If so, can you run gh-ost closer to the master? (same DC, same network) ; running gh-ost close to where binary logs are read from makes a major impact.

njfix6 commented 5 years ago

Hi, @shlomi-noach thanks for the response. Yes, we are connecting directly to master and monitoring the replica lag of 5 replicas. I believe this is option B of this document: https://github.com/github/gh-ost/blob/master/doc/cheatsheet.md. Do you recommend removing the --allow-on-master flag and run as option A or would it not make much of a difference? I think we would rather run as option B unless you recommend the other.

We are also running gh-ost in a different VPC than our production database. It would probably be a good idea to move this to the same VPC as our production database to help gh-ost keep up.

njfix6 commented 5 years ago

Also, could you help me understand the difference between --chunk-size and --dml-batch-size? I haven't seen a lot of documentation on it and would love to know when I should change these values and what they do. I'm not sure if these values would help with my current situation.

shlomi-noach commented 5 years ago

Do you recommend removing the --allow-on-master flag and run as option A or would it not make much of a difference? I think we would rather run as option B unless you recommend the other.

In terms of your performance issue I don't think A/B make a difference. Your bottleneck is the amount/rate of binary logs, and those will be identical in both scenarios.

could you help me understand the difference between --chunk-size and --dml-batch-size? I

By now you must have tried running gh-ost closer to the master -- any improvement?

njfix6 commented 5 years ago

Cool thanks for the info! I actually ran it again on a Monday instead of a Friday (where we have less traffic) and it could succeed. If I run into this issue again I will try to move gh-ost closer to master. The table we were running it on has probably our most written to the table so it was a great test for gh-ost.

njfix6 commented 5 years ago

@shlomi-noach If you don't mind me asking, what is the arguments that GitHub uses when running the migrations? I understand if you want to keep that private but I think would really help other production use cases like myself.

shlomi-noach commented 5 years ago
  gh-ost \
  --conf=/etc/mysql/gh-ost.cnf \
  --host=$ghost_replica \
  --database="$database_name" \
  --table="$table_name" \
  --alter="$ddl" \
  --max-load=Threads_running=100 \
  --critical-load=Threads_running=1000 \
  --critical-load-interval-millis=3000 \
  --critical-load-hibernate-seconds=600 \
  --chunk-size=$chunk_size \
  --throttle-http="http://freno.XXX:8111/${freno_path}" \
  --max-lag-millis="$max_lag_millis" \
  --heartbeat-interval-millis=100 \
  --replication-lag-query="select unix_timestamp(now(6)) - unix_timestamp(ts) as ghost_lag_check from github_meta.heartbeat order by ts desc limit 1" \
  --switch-to-rbr \
  --allow-master-master \
  --cut-over=default \
  --timestamp-old-table \
  --exact-rowcount \
  --concurrent-rowcount \
  --default-retries=120 \
  --panic-flag-file=/tmp/ghost.panic.flag \
  --postpone-cut-over-flag-file=/etc/github/ghost.postpone.flag \
  --serve-socket-file="$socket_file" \
  --hooks-path=/path/to/gh-ost-hooks/prod/ \
  --hooks-hint="${hooks_hint_user}" \
  --ssl \
  --ssl-allow-insecure \
  --verbose 
njfix6 commented 5 years ago

@shlomi-noach We tried running another migration and getting 1000/1000 backlog. I am running on in the same VPC this time but it still can't keep up. Don't get me wrong the table has a LOT of writes. Any other thoughts on what I could change? Or maybe I just have to figure out a way to reduce the writes. Thanks for the help again.

shlomi-noach commented 5 years ago

@njfix6 only remaining thoughts are the usual "improve performance and reduce durability" mysql settings, i.e. set global sync_binlog=0, innodb_flush_logs_at_trx_commit=2, which might buy you some breathing space.

ggunson commented 5 years ago

While I can't guarantee that this is the situation you hit @njfix6 I have seen situations where SSL-encrypted replication has been throttled on the io_thread, meaning that a replica would never catch up to the master because the IO thread was copying the direct master's binlogs very slowly.

The closest bug report I could see of this is https://bugs.mysql.com/bug.php?id=90919 but that's just from similar symptoms.

I saw one instance of running gh-ost where I believe it hit this bug (since it behaves as a replica) which is why I'm bringing this up.

For MySQL replicas my workaround was to set slave_compressed_protocol to ON on the replica; I don't know if this is available via the go mysql driver gh-ost uses.

If you can test with SSL off I'd suggest trying that out; or if testing with the values given, check the "streamer: mysql-bin-changelog.xxxxxxxx" value output over time and see if it stays similarly behind where the host's show master status binlog is.

UPDATE: On further reading I suppose that's not the problem you're encountering, though non-SSL replication might be faster.