Thriftpy / thriftpy2

Pure python approach of Apache Thrift.
MIT License
572 stars 91 forks source link

Use readall() in TAsyncSocket to ensure all data is received #99

Closed aiudirog closed 4 years ago

aiudirog commented 4 years ago

While accessing HBase via asyncio, I found that my test cases would randomly fail. It turned out that not all of the data was being consumed by TAsyncSocket.read(), so I refactored it to use the readall() that TAsyncBufferedTransport uses.

I moved readall() into socket.py because it made sense to put it with the lower level implementation.

aiudirog commented 4 years ago

I spent longer than I'm willing to admit trying to fix that bug thinking the readall() caused it, when in fact it only forced it to show itself:D.

codecov[bot] commented 4 years ago

Codecov Report

Merging #99 into master will decrease coverage by 0.01%. The diff coverage is 93.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #99      +/-   ##
==========================================
- Coverage   80.87%   80.86%   -0.02%     
==========================================
  Files          35       35              
  Lines        3529     3532       +3     
==========================================
+ Hits         2854     2856       +2     
- Misses        675      676       +1
Impacted Files Coverage Δ
thriftpy2/contrib/aio/transport/buffered.py 92.1% <100%> (-1.52%) :arrow_down:
thriftpy2/contrib/aio/socket.py 78.35% <91.66%> (+0.87%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 9b23e29...5ccbd0a. Read the comment docs.

ethe commented 4 years ago

Could you please provide more information about the reason for randomly failing? And which exception you got in the failing?

aiudirog commented 4 years ago

The error that actually bubbled up is struct.error emitted on one of the unpack lines like this. The specific line that it errored on changed depending on the test case (the data being sent back I assume), but the common thread was that not enough data was getting read from the socket so, for example, only 3 bytes were passed into unpack_int32().

Upon debugging, I found I could read more data from TAsyncSocket.read() if I called it again, so it wasn't that HBase was dropping data. Then I noticed that unlike all of the other transports I had played with, TAsyncSocket.read() didn't always return the length I had asked for, it just returns whatever the lower level socket returned. By adding the readall() wrapper, like the other transports, this issue was alleviated.

However, upon doing that, another error cropped up. All of a sudden, the socket was timing out in the readall() loop trying to read all of the data. That was when I discovered the bug in TAsyncBufferedTransport.read(). Every call to TAsyncSocket.read() on that line was always getting 4096 (buf_size), instead of sz because max was used instead of min. Since I had fixed the previous readall() bug, now it was actively trying to read 4096 bytes every time, when it should have been reading like 10.

I have a theory that the second bug is actually the reason the first one wasn't discovered sooner. Since TAsyncBufferedTransport was always asking the socket for 4096 bytes, all data was always consumed in the test cases because they are smaller than that size. My tests of bulk scanning HBase yielded much closer to or above the buffer size limit. This meant not all of it was being returned on the first call to TAsyncSocket.read(), therefore causing it to not consume that data.

Edit: I guess I forgot to address the randomness. I assume it was due to variations in delays between HBase sending data and my code reading it. Sometimes the test cases would pass, especially in the debugger (which made it very fun to track down). So, if you pause right before the read, the socket has time to buffer all of the data and the single read call might be able to consume it all.

ethe commented 4 years ago

Great stuff! I think you are right, it looks good to me, thank you.

aiudirog commented 4 years ago

Wow that was fast, thanks!

ethe commented 4 years ago

Hi @aiudirog, I think there is a simpler way to fix it, use readexactly rather than read, see https://docs.python.org/3/library/asyncio-stream.html#asyncio.StreamReader.readexactly. So sorry I will try to revert this pull request and test it.

aiudirog commented 4 years ago

I had no idea that method existed. I'm testing it out now with my happybase code.

aiudirog commented 4 years ago

A couple of test cases fail where it is asked to read 0 bytes (raises an EOF error). I'll make the changes and update this PR.

ethe commented 4 years ago

Thank you. I will await your response.

aiudirog commented 4 years ago

Okay, I've got it, this passed all thriftpy2 and aiohappybase test cases: https://github.com/aiudirog/thriftpy2/commit/16bbf52f3988c2766bb0162df25fad4298ea8d4c

Fun how simple it is with that method, haha.

aiudirog commented 4 years ago

Maybe force reset and I'll open a new PR?

ethe commented 4 years ago

I have reverted it.

ethe commented 4 years ago

And please process IncompleteReadError when call readexactly, and re-raise TTransportException.END_OF_FILE.