github / gh-ost

GitHub's Online Schema-migration Tool for MySQL
MIT License
12.12k stars 1.23k forks source link

gh-ost aborts: FATAL 2 successive failures in streamer reconnect at coordinates binlog.N:4 #317

Closed mizioumt closed 7 years ago

mizioumt commented 7 years ago

quite reproducible. I'll attach the log. It has:

2016-11-28 15:34:12 DEBUG Done streaming
2016-11-28 15:34:12 FATAL 2 successive failures in streamer reconnect at coordinates binlog.000504:4

that :4 is an invariant, on every run it fails at binlog.N:4

shlomi-noach commented 7 years ago

To explain a bit on what happened here:

The streamer is the part that connects to your MySQL server and impersonates as a replica. It seems to have been disconnected. Network problem? Timeout?

When it reconnects, it always reconnects at 4. This is because it want to make no assumptions on the state of the last executed binlog position. It then skips to the last known position.

I would suggest increasing the --default-retries count as you have had 2 successive failures in reconnecting.

mizioumt commented 7 years ago

to which MySQL server does it connect? In this case I had --test-on-replica, will it try to connect to the master or to the replica? If it's the primary master it is trying to connect to that could explain it.

shlomi-noach commented 7 years ago

The streamer connects to your replica, unless you specified --allow-on-master

mizioumt commented 7 years ago

60 --default-reties would not help. Maybe it is a sick server, I'll try another one for the same table.

shlomi-noach commented 7 years ago

I know a doctor.

mizioumt commented 7 years ago

same results for another one

2016-11-30 09:57:12 FATAL 61 successive failures in streamer reconnect at coordinates binlog.000001:4

and this:

$ grep econnect *out
2016-11-30 09:52:07 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:52:12 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:52:17 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:52:22 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:52:27 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:52:32 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:52:37 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:52:42 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:52:47 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:52:52 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:52:57 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:53:02 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:53:07 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:53:12 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:53:17 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:53:22 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:53:27 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:53:32 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:53:37 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:53:42 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:53:47 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:53:52 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:53:57 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:54:02 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:54:07 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:54:12 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:54:17 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:54:22 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:54:27 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:54:32 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:54:37 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:54:42 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:54:47 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:54:52 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:54:57 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:55:02 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:55:07 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:55:12 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:55:17 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:55:22 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:55:27 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:55:32 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:55:37 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:55:42 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:55:47 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:55:52 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:55:57 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:56:02 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:56:07 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:56:12 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:56:17 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:56:22 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:56:27 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:56:32 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:56:37 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:56:42 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:56:47 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:56:52 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:56:57 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:57:02 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:57:07 INFO Reconnecting... Will resume at binlog.000001:36934908
2016-11-30 09:57:12 FATAL 61 successive failures in streamer reconnect at coordinates binlog.000001:4
$ grep econnect *out|wc -l
62

and this:

$ grep slice *out|tail -1l
2016-11-30 09:57:07 INFO StreamEvents encountered unexpected error: Err: runtime error: slice bounds out of range
$ grep slice *out| wc -l
62

no I don't think it's the network. Maybe gh-ost sees an error when there is none. It would appear it reaches the end of the last available binlog, goes back and somehow does not appreciate it.

mizioumt commented 7 years ago

reproduced with another table. In addition there is an issue with locking:

T11                   ...                  T12      streamer is complaining and panicking which takes some minutes because by default gh-ost retries 60 times.
            T21       ...         T22               this is a subset of the previous interval. In the beginning another [not streamer] gh-ost thread declares some kind of success, locks tables
                                                          and 'waits for event processing to reach the lock', something to this point. This then dissipates at T22 and the locks are released. 
                                                          Lock hold time is a few minutes. At T12 gh-ost aborts so the table is left as is.

T22-T21 is 2-4 minutes and less than T12-T11 which is maybe 6 minutes. T21 to me looks unmarked in the log. T21 is marked like this:

      [OSC] Copy: 195691/195691 100.0%; Applied: 0; Backlog: 0/100; Time: 17s(total), 16s(copy); streamer: binlog.000026:67216409; State: migrating; ETA: due
Row copy complete
Grabbing voluntary lock: gh-ost.52392006.lock
Setting LOCK timeout as 6 seconds
Looking for magic cut-over table
      [OSC] Copy: 195691/195691 100.0%; Applied: 0; Backlog: 0/100; Time: 17s(total), 16s(copy); streamer: binlog.000026:67216409; State: migrating; ETA: due
Creating magic cut-over table __DB__.__TABLE___del`
Magic cut-over table created
Locking __DB__.__TABLE__, __DB__.__TABLE___del`
Tables locked
Session locking original & magic tables is 52392006
Writing changelog state: AllEventsUpToLockProcessed
Waiting for events up to lock
Reconnecting... Will resume at binlog.000026:67216086

it may be too optimistic as to when to lock and in any case I wonder for how much time this lock would be held under normal circumstances.

shlomi-noach commented 7 years ago

@mizioumt which version have you used in your latest comment? https://github.com/github/gh-ost/issues/317#issuecomment-276694761 I've recently fixed throttling-related issues at time of cut-over.

mizioumt commented 7 years ago

v1.0.28

shlomi-noach commented 7 years ago

@mizioumt please update to latest. This just rings too similar to something that was fixed. No promises of course ;)

mizioumt commented 7 years ago

with the latest the core issue is still there. I mean the streamer reconnection and successful failure. The locking is sort of improved, now it locks for 1-3 seconds, 43 times in a row, complaining about reconnection all the while. I think this has something to do with the 'slice out of bounds' error somehow. The log looks like this:

...
2017-02-02 14:26:04 X StreamEvents encountered unexpected error: Err: runtime error: slice bounds out of range
2017-02-02 14:26:04 X       [OSC] Stack: goroutine 715 [running]:
2017-02-02 14:26:04 X       [OSC] github.com/github/gh-ost/vendor/github.com/siddontang/go-mysql/mysql.Pstack(0xc42015d6f8, 0x67b420)
2017-02-02 14:26:04 X       [OSC]       /Users/shlomi-noach/dev/go/src/github.com/github/gh-ost/vendor/github.com/siddontang/go-mysql/mysql/util.go:18 +0x74
2017-02-02 14:26:04 X       [OSC] github.com/github/gh-ost/vendor/github.com/siddontang/go-mysql/replication.(*BinlogSyncer).onStream.func1(0xc4203a6120, 0xc4203abe00)
2017-02-02 14:26:04 X       [OSC]       /Users/shlomi-noach/dev/go/src/github.com/github/gh-ost/vendor/github.com/siddontang/go-mysql/replication/binlogsyncer.go:488 +0x84
2017-02-02 14:26:04 X       [OSC] panic(0x67b420, 0xc420010100)
2017-02-02 14:26:04 X       [OSC]       /usr/local/Cellar/go/1.7.3/libexec/src/runtime/panic.go:458 +0x243
2017-02-02 14:26:05 X       [OSC] github.com/github/gh-ost/vendor/github.com/siddontang/go-mysql/replication.(*RowsEvent).decodeValue(0xc42039dea0, 0xc4207c8192, 0x1d28, 0x3c6e, 0x2fd000f, 0x52, 0x0, 0x1, 0x0, 0x0)
2017-02-02 14:26:05 X       [OSC]       /Users/shlomi-noach/dev/go/src/github.com/github/gh-ost/vendor/github.com/siddontang/go-mysql/replication/row_event.go:467 +0xe60
2017-02-02 14:26:05 X       [OSC] Copy: 77000/193002 39.9%; Applied: 0; Backlog: 0/100; Time: 50s(total), 47s(copy); streamer: binlog.000706:340885; State: throttled, lag=1.869027s; ETA: 1m10s
2017-02-02 14:26:05 X       [OSC] github.com/github/gh-ost/vendor/github.com/siddontang/go-mysql/replication.(*RowsEvent).decodeRows(0xc42
2017-02-02 14:26:06 X       [OSC] Copy: 77000/193002 39.9%; Applied: 0; Backlog: 0/100; Time: 51s(total), 48s(copy); streamer: binlog.000706:340885; State: throttled, lag=2.068969s; ETA: 1m12s
2017-02-02 14:26:07 X       [OSC] Copy: 84000/193002 43.5%; Applied: 0; Backlog: 0/100; Time: 52s(total), 49s(copy); streamer: binlog.000706:340885; State: throttled, lag=1.669095s; ETA: 1m3s
2017-02-02 14:26:08 X       [OSC] Copy: 84000/193002 43.5%; Applied: 0; Backlog: 0/100; Time: 53s(total), 50s(copy); streamer: binlog.000706:340885; State: throttled, lag=2.669242s; ETA: 1m4s
2017-02-02 14:26:09 X Reconnecting... Will resume at binlog.000706:66943242
2017-02-02 14:26:09 X Registering replica at M:3306
2017-02-02 14:26:09 X Connecting binlog streamer at binlog.000706:4
2017-02-02 14:26:09 X       [OSC] Copy: 84000/193002 43.5%; Applied: 0; Backlog: 0/100; Time: 54s(total), 51s(copy); streamer: binlog.000706:340885; State: throttled, appdb-1037.ams4.prod.booking.com:3306 replica-lag=2.968916s; ETA: 1m6s
2017-02-02 14:26:09 X rotate to next log name: binlog.000706
2017-02-02 14:26:09 X StreamEvents encountered unexpected error: Err: runtime error: slice bounds out of range
...
[mizioumtchen@X ~]$ grep econnect s*0|wc -l
62
[mizioumtchen@X ~]$ grep 'panic(' s*0| wc -l
62
[mizioumtchen@X ~]$ grep slice s*0|wc -l
62
mizioumt commented 7 years ago

in ./vendor/github.com/siddontang/go-mysql/mysql/util.go the size of 1024 in func Pstack() is too small. I increased it to 65000 and got a complete stack or at least a longer stack. This is built based on source about a month ago.

2017-03-03 14:40:45 INFO StreamEvents encountered unexpected error: Err: runtime error: slice bounds out of range
 Stack: goroutine 14 [running]:
github.com/github/gh-ost/vendor/github.com/siddontang/go-mysql/mysql.Pstack(0xc4200676f8, 0x681fe0)
        /home/mizioumtchen/gh-ost/src/github.com/github/gh-ost/vendor/github.com/siddontang/go-mysql/mysql/util.go:18 +0x74
github.com/github/gh-ost/vendor/github.com/siddontang/go-mysql/replication.(*BinlogSyncer).onStream.func1(0xc42012e5c0, 0xc42015a540)
        /home/mizioumtchen/gh-ost/src/github.com/github/gh-ost/vendor/github.com/siddontang/go-mysql/replication/binlogsyncer.go:488 +0x84
panic(0x681fe0, 0xc4200120f0)
        /usr/lib/golang/src/runtime/panic.go:458 +0x243
github.com/github/gh-ost/vendor/github.com/siddontang/go-mysql/replication.(*RowsEvent).decodeValue(0xc4200c00a0, 0xc4201e00dc, 0x1ec8, 0x3d24, 0x300fc, 0x66de60, 0xc42045d750, 0x4, 0x0, 0x0)
        /home/mizioumtchen/gh-ost/src/github.com/github/gh-ost/vendor/github.com/siddontang/go-mysql/replication/row_event.go:455 +0x1807
github.com/github/gh-ost/vendor/github.com/siddontang/go-mysql/replication.(*RowsEvent).decodeRows(0xc4200c00a0, 0xc4201e0027, 0x1f7d, 0x3dd9, 0xc4200c0000, 0xc4201e001f, 0x8, 0x3de1, 0xc420067c68, 0x4121a8, ...)
        /home/mizioumtchen/gh-ost/src/github.com/github/gh-ost/vendor/github.com/siddontang/go-mysql/replication/row_event.go:310 +0x25b
github.com/github/gh-ost/vendor/github.com/siddontang/go-mysql/replication.(*RowsEvent).Decode(0xc4200c00a0, 0xc4201e0014, 0x1f90, 0x3dec, 0x5ad4a2, 0xc4205568a0)
        /home/mizioumtchen/gh-ost/src/github.com/github/gh-ost/vendor/github.com/siddontang/go-mysql/replication/row_event.go:267 +0x38d
github.com/github/gh-ost/vendor/github.com/siddontang/go-mysql/replication.(*BinlogParser).parseEvent(0xc420189900, 0xc4205568a0, 0xc4201e0014, 0x1f90, 0x3dec, 0x0, 0x0, 0xc420556880, 0x0)
        /home/mizioumtchen/gh-ost/src/github.com/github/gh-ost/vendor/github.com/siddontang/go-mysql/replication/parser.go:188 +0x12f
github.com/github/gh-ost/vendor/github.com/siddontang/go-mysql/replication.(*BinlogParser).parse(0xc420189900, 0xc4201e0001, 0x1fa7, 0x3dff, 0xc42027a1c0, 0x802880, 0xc4201a6540)
        /home/mizioumtchen/gh-ost/src/github.com/github/gh-ost/vendor/github.com/siddontang/go-mysql/replication/parser.go:223 +0x245
github.com/github/gh-ost/vendor/github.com/siddontang/go-mysql/replication.(*BinlogSyncer).parseEvent(0xc42015a540, 0xc42012e5c0, 0xc4201e0000, 0x1fa8, 0x3e00, 0x0, 0x0)
        /home/mizioumtchen/gh-ost/src/github.com/github/gh-ost/vendor/github.com/siddontang/go-mysql/replication/binlogsyncer.go:528 +0xfa
github.com/github/gh-ost/vendor/github.com/siddontang/go-mysql/replication.(*BinlogSyncer).onStream(0xc42015a540, 0xc42012e5c0)
        /home/mizioumtchen/gh-ost/src/github.com/github/gh-ost/vendor/github.com/siddontang/go-mysql/replication/binlogsyncer.go:502 +0x10f
created by github.com/github/gh-ost/vendor/github.com/siddontang/go-mysql/replication.(*BinlogSyncer).startDumpStream
        /home/mizioumtchen/gh-ost/src/github.com/github/gh-ost/vendor/github.com/siddontang/go-mysql/replication/binlogsyncer.go:255 +0x138
mizioumt commented 7 years ago

this part of MYSQL_TYPE_BLOB processing in row_event.go is suspicious. It is rewritten in the current master but the logic is the same. length is extracted from the first 4 bytes. This overwrites with the data area. I would use data[0:2] for length which I tried and it didn't work. Maybe FixedLengthInt is not the right tool for extracting the length here.

453 >···>···case 3:
454 >···>···>···length = int(FixedLengthInt(data[0:3]))
455 >···>···>···v = data[3 : 3+length]
456 >···>···>···n = length + 3
mizioumt commented 7 years ago

the trouble indeed comes from the piece of code in question. And it looks like it should be length = int(FixedLengthInt(data[0:2])). However it's still failing. When I added the printout below I got

#debug length 13339 data 13342 slen %!s(int=13339)

then it panicked because of

#debug length 2110769 data 7880 slen %!s(int=13617)

so the 4 byte length is not good. The 3 byte length looks reasonable for this kind of data . Yet the data is too short. I don't think we have a corrupted MEDIUMBLOB everywhere, more probably something in the caller that sends the wrong data array.

>···>···case 3:
>···>···>···length = int(FixedLengthInt(data[0:3]))
>···>···>···var slength int = int(FixedLengthInt(data[0:2]))
>···>···>···fmt.Printf("#debug length %d data %d slen %d\n",length,len(data),slength);
>···>···>···v = data[3 : 3+length]
>···>···>···n = length + 3
shlomi-noach commented 7 years ago

cc @siddontang (author of go-mysql) -- please see last 4 comments by @mizioumt 🙇

siddontang commented 7 years ago

Hi @mizioumt

can you show me your table schema?

It is appreciated that you can give me some example data. :-)

mizioumt commented 7 years ago

the table I'm trying to ALTER is inactive. The overall schema has many tables. What if I showed you some information from the binlog over which it is failing?

siddontang commented 7 years ago

HI @mizioumt

You can add a print here https://github.com/siddontang/go-mysql/blob/master/replication/row_event.go#L57

fmt.Printf("%s-%s", e.Schema, e.Table), 

So when the panic happens, we can know the real database and table, then you can show the table schema to me.

You should also print the raw hex data when decode TableMapEvent and RowsEvent here if you can't know the data:

https://github.com/siddontang/go-mysql/blob/master/replication/row_event.go#L36 https://github.com/siddontang/go-mysql/blob/master/replication/row_event.go#L228

of course, maybe the source line is different in gh-ost vendor, but it is still easy for you to do it.

Thank you very much.

mizioumt commented 7 years ago

I started with the printout of table and schema names. I'll attach the file with table structure. It's a bit puzzling that in this case the panic happened without going through my other debugging output around what I assumed MEDIUMBLOB processing. So it just said:

#debug s-t dbname-Table_name
2017-03-06 16:34:36 INFO StreamEvents encountered unexpected error: Err: runtime error: slice bounds out of range
...
panic(0x681fe0, 0xc4200120f0)
>···/usr/lib/golang/src/runtime/panic.go:458 +0x243
github.com/github/gh-ost/vendor/github.com/siddontang/go-mysql/replication.(*RowsEvent).decodeValue(0xc42023a320, 0xc42061c7be, 0x174, 0x442, 0x2fd000f, 0x52, 0x0, 0x1, 0x0, 0x0)
>···/home/mizioumtchen/gh-ost/src/github.com/github/gh-ost/vendor/github.com/siddontang/go-mysql/replication/row_event.go:470 +0xe5e

line 470 in my build is this:

>···case MYSQL_TYPE_VARCHAR,
>···>···MYSQL_TYPE_VAR_STRING:
>···>···length = int(meta)
>···>···v, n = decodeString(data, length)

when it hits the same place the second time because of --retry-attempts=1 it says this:

#debug s-t dbname-Table_name
2017-03-06 16:34:43 DEBUG Skipping handled query at binlog.000022:28141425
2017-03-06 16:34:43 DEBUG Skipping handled query at binlog.000022:28141696
2017-03-06 16:34:43 DEBUG Skipping handled query at binlog.000022:28142059
2017-03-06 16:34:43 INFO StreamEvents encountered unexpected error: Err: runtime error: slice bounds out of range
 Stack: goroutine 1971 [running]:
github.com/github/gh-ost/vendor/github.com/siddontang/go-mysql/mysql.Pstack(0xc4204256b0, 0x681fe0)

does it mean that the panic is in the processing of the event following 28142059?

mizioumt commented 7 years ago

here's the structure:

*************************** 1. row ***************************
       Table: Table_name
Create Table: CREATE TABLE Table_name (
  ITEM1 int(10) unsigned NOT NULL,
  ITEM2 varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  ITEM3 varchar(255) CHARACTER SET utf8 NOT NULL DEFAULT '',
  ITEM4 varchar(255) CHARACTER SET utf8 NOT NULL DEFAULT '',
  ITEM5 char(2) NOT NULL DEFAULT '',
  ITEM6 int(11) NOT NULL,
  ITEM7 int(11) DEFAULT NULL,
  ITEM8 decimal(2,1) NOT NULL DEFAULT '0.0',
  ITEM9 tinyint(4) NOT NULL DEFAULT '0',
  ITEM10 varchar(3) NOT NULL,
  ITEM11 double NOT NULL DEFAULT '0',
  ITEM12 double NOT NULL DEFAULT '0',
  ITEM13 varchar(20) CHARACTER SET utf8 NOT NULL DEFAULT '',
  ITEM14 varchar(255) DEFAULT NULL,
  ITEM15 float NOT NULL DEFAULT '0',
  ITEM16 tinyint(4) NOT NULL DEFAULT '0',
  ITEM17 mediumint(9) DEFAULT NULL,
  ITEM18 float NOT NULL DEFAULT '10',
  ITEM19 float DEFAULT NULL,
  ITEM20 float DEFAULT NULL,
  ITEM21 float DEFAULT NULL,
  ITEM22 float DEFAULT NULL,
  ITEM23 float DEFAULT NULL,
  ITEM24 float DEFAULT NULL,
  ITEM25 float DEFAULT NULL,
  ITEM26 double DEFAULT NULL,
  ITEM27 double DEFAULT NULL,
  ITEM28 varchar(5) NOT NULL DEFAULT '',
  ITEM29 varchar(5) NOT NULL DEFAULT '',
  ITEM30 varchar(5) NOT NULL DEFAULT '',
  ITEM31 varchar(5) NOT NULL DEFAULT '',
  ITEM32 timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  ITEM33 varchar(255) DEFAULT NULL,
  ITEM34 timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  ITEM35 char(2) NOT NULL DEFAULT 'en',
  ITEM36 int(11) DEFAULT NULL,
  ITEM37 int(11) DEFAULT NULL,
  ITEM38 tinyint(3) unsigned NOT NULL DEFAULT '0',
  ITEM39 tinyint(4) NOT NULL DEFAULT '1',
  ITEM40 char(2) DEFAULT NULL,
  ITEM41 varchar(255) DEFAULT NULL,
  ITEM42 int(10) unsigned NOT NULL DEFAULT '0',
  ITEM43 int(11) NOT NULL DEFAULT '0',
  ITEM44 int(10) unsigned DEFAULT NULL,
  ITEM45 varchar(255) DEFAULT NULL,
  ITEM46 timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  ITEM47 varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  ITEM48 varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  ITEM49 varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  ITEM50 enum(_VALUES_) NOT NULL DEFAULT '_DEFAULT_',
  ITEM51 int(10) unsigned NOT NULL DEFAULT '0',
  ITEM52 char(1) NOT NULL DEFAULT 'P',
  ITEM53 tinyint(3) unsigned NOT NULL DEFAULT '0',
  ITEM54 smallint(5) unsigned NOT NULL DEFAULT '24',
  ITEM55 tinyint(4) NOT NULL DEFAULT '0',
  ITEM56 int(10) unsigned DEFAULT NULL,
  ITEM57 int(10) unsigned DEFAULT NULL,
  ITEM58 char(1) DEFAULT 'I',
  ITEM59 char(1) DEFAULT 'A',
  ITEM60 char(1) DEFAULT 'I',
  ITEM61 smallint(5) unsigned DEFAULT '1',
  ITEM62 timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  ITEM63 varchar(20) DEFAULT NULL,
  ITEM64 tinyint(3) unsigned NOT NULL DEFAULT '0',
  ITEM65 tinyint(3) unsigned DEFAULT NULL,
  ITEM66 tinyint(3) unsigned DEFAULT NULL,
  ITEM67 tinyint(3) unsigned NOT NULL DEFAULT '0',
  ITEM68 tinyint(3) unsigned NOT NULL DEFAULT '0',
  ITEM69 int(10) unsigned DEFAULT NULL,
  ITEM70 tinyint(3) unsigned DEFAULT '0',
  ITEM71 set(_VALUES_) DEFAULT NULL,
  ITEM72 tinyint(3) unsigned NOT NULL DEFAULT '0',
  ITEM73 tinyint(4) NOT NULL DEFAULT '0',
  ITEM74 tinyint(3) unsigned NOT NULL DEFAULT '100',
  ITEM75 tinyint(1) unsigned NOT NULL DEFAULT '1',
  ITEM76 date DEFAULT NULL COMMENT "_comment_",
  ITEM77 decimal(2,1) NOT NULL DEFAULT '0.0',
  ITEM78 tinyint(3) unsigned DEFAULT NULL,
  ITEM79 tinyint(3) unsigned NOT NULL DEFAULT '0',
  ITEM80 tinyint(3) unsigned NOT NULL DEFAULT '0',
  ITEM81 tinyint(3) unsigned DEFAULT '1',
  ITEM82 tinyint(3) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (ITEM1),
  UNIQUE KEY ITEM84 (ITEM5,ITEM14),
  KEY ITEM6 (ITEM6),
  KEY ITEM5 (ITEM5,ITEM16),
  KEY ITEM43 (ITEM43),
  KEY ITEM38 (ITEM38,ITEM6,ITEM5),
  KEY ITEM89 (ITEM16,ITEM42),
  KEY ITEM56 (ITEM56),
  KEY ITEM57 (ITEM57),
  KEY ITEM69 (ITEM69),
  KEY ITEM93 (ITEM38,ITEM42),
  KEY ITEM37 (ITEM37),
  KEY ITEM36 (ITEM36),
  KEY ITEM26 (ITEM26,ITEM27,ITEM42,ITEM6),
  KEY ITEM32 (ITEM32,ITEM1,ITEM38),
  KEY ITEM98 (ITEM38,ITEM26,ITEM27,ITEM42),
  KEY ITEM99 (ITEM5,ITEM6,ITEM2),
  KEY ITEM8 (ITEM8),
  KEY ITEM101 (ITEM5,ITEM38,ITEM10),
  KEY ITEM29 (ITEM29),
  KEY ITEM13 (ITEM13),
  KEY ITEM4 (ITEM4),
  KEY ITEM14 (ITEM14),
  KEY ITEM2 (ITEM2),
  KEY ITEM10 (ITEM10,ITEM38),
  KEY ITEM108 (ITEM44,ITEM6),
  KEY ITEM109 (ITEM82,ITEM6,ITEM5),
  KEY ITEM110 (ITEM82,ITEM26,ITEM27)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
shlomi-noach commented 7 years ago

This table must go to the Hall of Fame.

siddontang commented 7 years ago

Hi @mizioumt

what's the raw data when decode TableMapEvent and RowsEvent?

siddontang commented 7 years ago

Hi @mizioumt

I can't create the table with error in MySQL 5.6 and 5.7

ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '_VALUES_) NOT NULL DEFAULT '_DEFAULT_',

Can you use show create table to show me the table schema?

mizioumt commented 7 years ago

Hi, @siddontang, no, the structure has to be obfuscated. Same presents challenges for binlog dumps. Here's something that works. You need to disable strict sql mode because of the unfortunate zero defaults for timestamps. In this case the number of enum and set values is different from the original. If this is a problem I could improve that.

CREATE TABLE Table_name (
  ITEM1 int(10) unsigned NOT NULL,
  ITEM2 varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  ITEM3 varchar(255) CHARACTER SET utf8 NOT NULL DEFAULT '',
  ITEM4 varchar(255) CHARACTER SET utf8 NOT NULL DEFAULT '',
  ITEM5 char(2) NOT NULL DEFAULT '',
  ITEM6 int(11) NOT NULL,
  ITEM7 int(11) DEFAULT NULL,
  ITEM8 decimal(2,1) NOT NULL DEFAULT '0.0',
  ITEM9 tinyint(4) NOT NULL DEFAULT '0',
  ITEM10 varchar(3) NOT NULL,
  ITEM11 double NOT NULL DEFAULT '0',
  ITEM12 double NOT NULL DEFAULT '0',
  ITEM13 varchar(20) CHARACTER SET utf8 NOT NULL DEFAULT '',
  ITEM14 varchar(255) DEFAULT NULL,
  ITEM15 float NOT NULL DEFAULT '0',
  ITEM16 tinyint(4) NOT NULL DEFAULT '0',
  ITEM17 mediumint(9) DEFAULT NULL,
  ITEM18 float NOT NULL DEFAULT '10',
  ITEM19 float DEFAULT NULL,
  ITEM20 float DEFAULT NULL,
  ITEM21 float DEFAULT NULL,
  ITEM22 float DEFAULT NULL,
  ITEM23 float DEFAULT NULL,
  ITEM24 float DEFAULT NULL,
  ITEM25 float DEFAULT NULL,
  ITEM26 double DEFAULT NULL,
  ITEM27 double DEFAULT NULL,
  ITEM28 varchar(5) NOT NULL DEFAULT '',
  ITEM29 varchar(5) NOT NULL DEFAULT '',
  ITEM30 varchar(5) NOT NULL DEFAULT '',
  ITEM31 varchar(5) NOT NULL DEFAULT '',
  ITEM32 timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  ITEM33 varchar(255) DEFAULT NULL,
  ITEM34 timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  ITEM35 char(2) NOT NULL DEFAULT 'en',
  ITEM36 int(11) DEFAULT NULL,
  ITEM37 int(11) DEFAULT NULL,
  ITEM38 tinyint(3) unsigned NOT NULL DEFAULT '0',
  ITEM39 tinyint(4) NOT NULL DEFAULT '1',
  ITEM40 char(2) DEFAULT NULL,
  ITEM41 varchar(255) DEFAULT NULL,
  ITEM42 int(10) unsigned NOT NULL DEFAULT '0',
  ITEM43 int(11) NOT NULL DEFAULT '0',
  ITEM44 int(10) unsigned DEFAULT NULL,
  ITEM45 varchar(255) DEFAULT NULL,
  ITEM46 timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  ITEM47 varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  ITEM48 varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  ITEM49 varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  ITEM50 enum("_VALUES_","_DEFAULT_") NOT NULL DEFAULT '_DEFAULT_',
  ITEM51 int(10) unsigned NOT NULL DEFAULT '0',
  ITEM52 char(1) NOT NULL DEFAULT 'P',
  ITEM53 tinyint(3) unsigned NOT NULL DEFAULT '0',
  ITEM54 smallint(5) unsigned NOT NULL DEFAULT '24',
  ITEM55 tinyint(4) NOT NULL DEFAULT '0',
  ITEM56 int(10) unsigned DEFAULT NULL,
  ITEM57 int(10) unsigned DEFAULT NULL,
  ITEM58 char(1) DEFAULT 'I',
  ITEM59 char(1) DEFAULT 'A',
  ITEM60 char(1) DEFAULT 'I',
  ITEM61 smallint(5) unsigned DEFAULT '1',
  ITEM62 timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  ITEM63 varchar(20) DEFAULT NULL,
  ITEM64 tinyint(3) unsigned NOT NULL DEFAULT '0',
  ITEM65 tinyint(3) unsigned DEFAULT NULL,
  ITEM66 tinyint(3) unsigned DEFAULT NULL,
  ITEM67 tinyint(3) unsigned NOT NULL DEFAULT '0',
  ITEM68 tinyint(3) unsigned NOT NULL DEFAULT '0',
  ITEM69 int(10) unsigned DEFAULT NULL,
  ITEM70 tinyint(3) unsigned DEFAULT '0',
  ITEM71 set("_VALUES_","_DEFAULT_") DEFAULT NULL,
  ITEM72 tinyint(3) unsigned NOT NULL DEFAULT '0',
  ITEM73 tinyint(4) NOT NULL DEFAULT '0',
  ITEM74 tinyint(3) unsigned NOT NULL DEFAULT '100',
  ITEM75 tinyint(1) unsigned NOT NULL DEFAULT '1',
  ITEM76 date DEFAULT NULL COMMENT "_comment_",
  ITEM77 decimal(2,1) NOT NULL DEFAULT '0.0',
  ITEM78 tinyint(3) unsigned DEFAULT NULL,
  ITEM79 tinyint(3) unsigned NOT NULL DEFAULT '0',
  ITEM80 tinyint(3) unsigned NOT NULL DEFAULT '0',
  ITEM81 tinyint(3) unsigned DEFAULT '1',
  ITEM82 tinyint(3) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (ITEM1),
  UNIQUE KEY ITEM84 (ITEM5,ITEM14),
  KEY ITEM6 (ITEM6),
  KEY ITEM5 (ITEM5,ITEM16),
  KEY ITEM43 (ITEM43),
  KEY ITEM38 (ITEM38,ITEM6,ITEM5),
  KEY ITEM89 (ITEM16,ITEM42),
  KEY ITEM56 (ITEM56),
  KEY ITEM57 (ITEM57),
  KEY ITEM69 (ITEM69),
  KEY ITEM93 (ITEM38,ITEM42),
  KEY ITEM37 (ITEM37),
  KEY ITEM36 (ITEM36),
  KEY ITEM26 (ITEM26,ITEM27,ITEM42,ITEM6),
  KEY ITEM32 (ITEM32,ITEM1,ITEM38),
  KEY ITEM98 (ITEM38,ITEM26,ITEM27,ITEM42),
  KEY ITEM99 (ITEM5,ITEM6,ITEM2),
  KEY ITEM8 (ITEM8),
  KEY ITEM101 (ITEM5,ITEM38,ITEM10),
  KEY ITEM29 (ITEM29),
  KEY ITEM13 (ITEM13),
  KEY ITEM4 (ITEM4),
  KEY ITEM14 (ITEM14),
  KEY ITEM2 (ITEM2),
  KEY ITEM10 (ITEM10,ITEM38),
  KEY ITEM108 (ITEM44,ITEM6),
  KEY ITEM109 (ITEM82,ITEM6,ITEM5),
  KEY ITEM110 (ITEM82,ITEM26,ITEM27)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
mizioumt commented 7 years ago

@siddontang, I'll try to get the data dumps a little later, maybe tomorrow. What would help if you suggested debugging output that would look like what mysqlbinlog shows: schema.table INSERT|UPDATE|DELETE and column numbers with data. Or maybe you already have a mysqlbinlog like utility based entirely on your code?

siddontang commented 7 years ago

yes, my go-mysql has a similar too named go-mysqlbinlog, you can build and use it directly.

siddontang commented 7 years ago

@mizioumt

I think you should give me some example data, I can't reproduce it.

siddontang commented 7 years ago

Any update @mizioumt ?

mizioumt commented 7 years ago

what is the proper way to build gh-ost and go-mysql? For gh-ost I run GOPATH=pwd ./build.sh but I had to set up some symbolic links to get rid of 'package not found' messages. Is there a simpler way? You could safely assume I know nothing about anything golang related.

siddontang commented 7 years ago

Hi @mizioumt

You can add some messages at:

Then re-run the gh-ost again until meet panic, then you can show me some the new added message s. If you care the data security, you can send me with email siddontang@gmail.com directly.

shlomi-noach commented 7 years ago

@mizioumt gh-ost has no external dependencies. Try building via script/build for a completely isolated build process (this is what we use on CI)

mizioumt commented 7 years ago

script/build works flawlessly. Another news is the I cloned gh-ost github repo and with this build this issue would no longer reproduce. So I guess the current go-mysql works better. Is this build exactly the same as v1.0.35?

One small thing I noticed is messages with non-ascii characters. Some of them appear even without --debug --verbose

2017/03/15 08:50:45 binlogsyncer.go:102: ^[[0;37m[info] syncer is closing... ^[[0m
2017/03/15 08:50:45 binlogsyncer.go:490: ^[[0;31m[error] connection was bad ^[[0m
2017/03/15 08:50:45 binlogstreamer.go:47: ^[[0;31m[error] close sync with err: Sync was closed^[[0m
2017-03-15 08:50:45 INFO StreamEvents encountered unexpected error: Sync was closed
mizioumt commented 7 years ago

so this can be closed. I'm still not confident about streamer's treatment of MEDIUMBLOB but I'll look at it separately.

shlomi-noach commented 7 years ago

@mizioumt

Is this build exactly the same

It was built on OS/X where you built it on Linux, so that's a difference -- but I don't think the difference would manifest in the report on this Issue.

One small thing I noticed is messages with non-ascii characters. Some of them appear even without --debug --verbose

Weird. Haven't seen this.

siddontang commented 7 years ago

Hi @mizioumt

One small thing I noticed is messages with non-ascii characters. Some of them appear even without --debug --verbose

The log highlight is caused by go-mysql, I will disable it by default.