Random exception by reading wrong position

pacuum commented 7 years ago

When I read many records (10,000 rows with 20 fields), crystal-mysql randomly raise exceptions when I call rs.read(). Stacktrace also changes randomly from the following 2 patterns in my case.

Invalid time (ArgumentError)
0x505cc7: *CallStack::unwind:Array(Pointer(Void)) at ??
0x5cfec4: initialize at /opt/crystal/src/time.cr 133:7
0x5d05f5: initialize at /opt/crystal/src/time.cr 125:3
0x5d0559: *Time::new<Int32, Int32, Int32, Int32, Int32, Int32>:Time at /opt/crystal/src/time.cr 500:5
0x64b4b2: read at /vagrant/workspace/mycode/lib/mysql/src/mysql/types.cr 178:14
0x648fb4: read at /vagrant/workspace/mycode/lib/mysql/src/mysql/result_set.cr 83:13
0x64a5aa: read at /vagrant/workspace/mycode/lib/db/src/db/result_set.cr 79:15

Invalid Int32:  (ArgumentError)
0x505cc7: *CallStack::unwind:Array(Pointer(Void)) at ??
0x525a4b: to_i32 at /opt/crystal/src/string.cr 418:5
0x52593c: to_i at /opt/crystal/src/string.cr 319:5

I tried to find the reason and it seems that sometimes crystal-mysql forgets to read some bytes from IO and reads bytes from wrong place. For example, after fetching a record with id = 221, it then try to fetch the content of the row of id = 222. But then rs.read(Int32) returns 56832 which equals 222 256 which indicates one byte shift of reading position. In another case there was two bytes shift (1975 expected but got 129433683 = 1975 65536 + 83). If this happens it soon ends up with the exceptions above.

It starts to happen at somewhat random place of the response stream. Sometimes it happens at 222th row, sometimes at 1975th row. But it's not totally at random. If I run my code 10 times, 5 times at 222th row, 2 times at 1975th row, ...

I suspected GC but even if I GC.disable, the problem still occurs.

$ crystal -v
Crystal 0.23.1 [e2a1389] (2017-07-13) LLVM 3.8.1
$ uname -a 
Linux ubuntu-xenial 4.4.0-87-generic #110-Ubuntu SMP Tue Jul 18 12:55:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
$ mysql --version
mysql  Ver 14.14 Distrib 5.7.16, for linux-glibc2.5 (x86_64) using  EditLine wrapper

Any help? I continue to find the reason but if anybody suggest me anything I'm willing to do (printf debug in io.cr, for example). Maybe it is related to #39 .

If I limit the number of rows to 10, it does not happen in my environment. So my workaround is to divide 10,000 records into 1,000 sql to read.

pacuum commented 7 years ago

It seems that the last few bytes of null bitmap is "reused".

According to tcpdump, I received the following bytes for the failed row.

3500 0093 00a0 5300 b707

But my print debug in IO#read_fully?(slice : Bytes) tells me that

It reads 4 bytes 35 00 00 93 as the beginning of the new row
Then read 3 bytes a0 53 00 for null bitmap.
Then read 4 bytes 53 00 b7 07 for id field, which is expected to be b7 07 00 00 = 1975 So the 2 bytes "53 00" repeated.

Seems a crystal's bug.

pacuum commented 7 years ago

I found the root cause. It only happens when a read request to mysql TCP socket is issued across buffer boundary. For example, when there are only 2 bytes remaining in the buffer, if 4 bytes are requested, then IO::Buffered read only the first 2 bytes and do nothing for the last 2 bytes, returning 2 as number of bytes read. crystal-mysql still can read 4 bytes from the slice but the content of the last 2 bytes are unpredictable.

One solution is to check the number of bytes read and call @io.read() again if necessary.

But I feel the behavior of IO::Buffered is not intuitive. I think it should work transparently.

pacuum commented 7 years ago

https://github.com/crystal-lang/crystal/issues/4796 So it's not crystal's bug.

crystal-lang / crystal-mysql

Random exception by reading wrong position #44