apoelstra / rust-jsonrpc

Rust JSONRPC library
Creative Commons Zero v1.0 Universal
158 stars 62 forks source link

simple_http: fix detecting broken pipe on request #86

Closed philipr-za closed 1 year ago

philipr-za commented 1 year ago

While upgrading from v0.11 to v0.14 I noticed that I started getting the following error when the server had disconnected the socket between requests, in our case due to the server idle timeout:

Err(Transport(HttpResponseTooShort { actual: 0, needed: 12 }))

The same issue as is reported in https://github.com/apoelstra/rust-jsonrpc/issues/79 I believe.

I noticed https://github.com/apoelstra/rust-jsonrpc/pull/84 was an attempt to solve the problem and I think I found out why it didn't work as expected.

The client is using a BufWriter to write the TcpStream and unless you flush the buffer you will not see the broken pipe error in time in order to request a fresh connection.

This PR adds in the flushes and provides a unit test demonstrating that it fixes the issue. If you comment out the flushes you will see the symptom reported in https://github.com/apoelstra/rust-jsonrpc/issues/79.

Revision

The first approach of flushing the write buffer twice during the POST angered HTTP servers that expected the request to be contained in a single message.

The revised approach uses the method that is advocated in the unix docs to detect a broken TCP stream which is that if the blocking read operation returns 0 bytes then the socket is closed. So this PR creates a buffer to hold the request, attempts to send it and then reads the response. If the response was length 0 it will attempt to get a fresh socket and send the request a second time.

philipr-za commented 1 year ago

Marked as draft because it seems my submission was a little hasty. After I tried this version in our application I saw a new HTTP error pop up so just need to investigate that.

apoelstra commented 1 year ago

Aw, damn :) good luck!

philipr-za commented 1 year ago

Ok, second time lucky. I added a description of the revised approach in the PR description.

apoelstra commented 1 year ago

The unit test fails if the proxy feature is enabled. You may want to disable the test in that case.

apoelstra commented 1 year ago

Looks good to me, other than the failing unit test. I tested that I can sync the chain with no significant slowdown vs master.

philipr-za commented 1 year ago

Looks good to me, other than the failing unit test. I tested that I can sync the chain with no significant slowdown vs master.

As suggested I just disabled the test when the proxy feature is enabled

apoelstra commented 1 year ago

The 1.41 failure is unrelated to this PR and we can ignore it. I have tested locally with 1.41.

But where I am seeing failures is when I try to run this test multiple times in parallel. Different tests wind up connecting to each others' sockets and then the wrong values get read.

Let me try testing with nix, which provides network isolation so might be okay... but I'm not thrilled about this situation.