bitcoin / bitcoin

Bitcoin Core integration/staging tree
https://bitcoincore.org/en/download
MIT License
78.49k stars 36.19k forks source link

[qa] Tests occasionally raise http.client.RemoteDisconnected or BadStatusLine #11777

Closed maflcko closed 5 years ago

maflcko commented 6 years ago

Unrelated to any specific test, randomly a BadStatusLine might raise when reading the response.

This only happens on travis, so assigning low priority.

Recent examples:

ryanofsky commented 6 years ago

Saw this with importprunedfunds in https://travis-ci.org/bitcoin/bitcoin/jobs/327693845. Here's the stack trace in case it it might help people find this issue:

stderr:
Traceback (most recent call last):
  File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/test_framework/authproxy.py", line 107, in _request
    return self._get_response()
  File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/test_framework/authproxy.py", line 153, in _get_response
    http_response = self.__conn.getresponse()
  File "/usr/lib/python3.4/http/client.py", line 1208, in getresponse
    response.begin()
  File "/usr/lib/python3.4/http/client.py", line 380, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.4/http/client.py", line 350, in _read_status
    raise BadStatusLine(line)
http.client.BadStatusLine: ''
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/importprunedfunds.py", line 114, in <module>
    ImportPrunedFundsTest().main()
  File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/test_framework/test_framework.py", line 141, in main
    self.stop_nodes()
  File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/test_framework/test_framework.py", line 258, in stop_nodes
    node.stop_node()
  File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/test_framework/test_node.py", line 124, in stop_node
    self.stop()
  File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/test_framework/coverage.py", line 47, in __call__
    return_val = self.auth_service_proxy_instance.__call__(*args, **kwargs)
  File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/test_framework/authproxy.py", line 136, in __call__
    response = self._request('POST', self.__url.path, postdata.encode('utf-8'))
  File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/test_framework/authproxy.py", line 111, in _request
    self.__conn.request(method, path, postdata, headers)
  File "/usr/lib/python3.4/http/client.py", line 1125, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python3.4/http/client.py", line 1163, in _send_request
    self.endheaders(body)
  File "/usr/lib/python3.4/http/client.py", line 1121, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python3.4/http/client.py", line 951, in _send_output
    self.send(msg)
  File "/usr/lib/python3.4/http/client.py", line 886, in send
    self.connect()
  File "/usr/lib/python3.4/http/client.py", line 863, in connect
    self.timeout, self.source_address)
  File "/usr/lib/python3.4/socket.py", line 512, in create_connection
    raise err
  File "/usr/lib/python3.4/socket.py", line 503, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused
achow101 commented 6 years ago

I'm also seeing this on my BIP 174 test here: https://travis-ci.org/bitcoin/bitcoin/jobs/327867319. It appears to be happening consistently though, so I'm not sure if that's a test problem or a Travis problem.

TheBlueMatt commented 6 years ago

This is a bug in the python-bitcoinrpc library - I get it all the time locally, something to do with the underlying HTTP request getting closed and not automatically reopened. In my own scripts I generally wrap every call and retry if it falls.

On January 12, 2018 12:45:15 AM UTC, Andrew Chow notifications@github.com wrote:

I'm also seeing this on my BIP 174 test here: https://travis-ci.org/bitcoin/bitcoin/jobs/327867319. It appears to be happening consistently though, so I'm not sure if that's a test problem or a Travis problem.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/bitcoin/bitcoin/issues/11777#issuecomment-357109147

jnewbery commented 6 years ago

@achow101 - I think your error is different. bitcoind is OOM'ing: https://travis-ci.org/bitcoin/bitcoin/jobs/327867319#L6571

All the other examples are when the stop RPC is called. authproxy.py has handling for if the RPC call raises a http.client.BadStatusLine because the connection was closed:

https://github.com/bitcoin/bitcoin/blob/44080a90a29292df96e92f22242785c5040000a1/test/functional/test_framework/authproxy.py#L109

It tries to reconnect and send the RPC again. I guess if the stop RPC is successful but causes the connection to be closed before the response is received, then authproxy won't be able to reconnect and we'll get the subsequent ConnectionRefusedError.

Shutdown processing was changed here: https://github.com/bitcoin/bitcoin/pull/11006. I wonder if that's made this worse. I guess we see it on travis because it's a timing thing.

@TheBlueMatt - you're able to reproduce this locally. Can you check whether reverting #11006 improves this for you?

We could fix this by special casing calls to stop, and wrapping them in a try-except (which would check that the process had indeed stopped and then continue).

maflcko commented 6 years ago

We could fix this by special casing calls to stop, and wrapping them in a try-except (which would check that the process had indeed stopped and then continue).

Imo, we should properly send the response before cutting the connection and going silent.

TheBlueMatt commented 6 years ago

Err, no, I've never seen the issue when running our test suite, only when using python-bitcoinrpc locally against a local bitcoind for other stuff.

On January 15, 2018 3:59:49 PM UTC, John Newbery notifications@github.com wrote:

@achow101 - I think your error is different. bitcoind is OOM'ing: https://travis-ci.org/bitcoin/bitcoin/jobs/327867319#L6571

All the other examples are when the stop RPC is called. authproxy.py has handling for if the RPC call raises a http.client.BadStatusLine because the connection was closed:

https://github.com/bitcoin/bitcoin/blob/44080a90a29292df96e92f22242785c5040000a1/test/functional/test_framework/authproxy.py#L109

It tries to reconnect and send the RPC again. I guess if the stop RPC is successful but causes the connection to be closed before the response is received, then authproxy won't be able to reconnect and we'll get the subsequent ConnectionRefusedError.

Shutdown processing was changed here: https://github.com/bitcoin/bitcoin/pull/11006. I wonder if that's made this worse. I guess we see it on travis because it's a timing thing.

@TheBlueMatt - you're able to reproduce this locally. Can you check whether reverting #11006 improves this for you?

We could fix this by special casing calls to stop, and wrapping them in a try-except (which would check that the process had indeed stopped and then continue).

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/bitcoin/bitcoin/issues/11777#issuecomment-357722916

maflcko commented 6 years ago

@TheBlueMatt I assume you are using python-bitcoinrpc:1.0 and python:3.4, which will incorrectly throw BadStatusLine due to a bug in python (c.f. https://bugs.python.org/issue3566).

In case my assumption is wrong, do you see this exception also for rpcs other than stop?

maflcko commented 6 years ago

So I tried running with the depends build (i.e. libevent 2.1.8-stable) and python:3.4.3 from trusty. So all of that should be identical to what travis is running, but I couldn't reproduce the issue.

Since we print the consolidated logs on failure on travis, someone should copy-paste them here before wiping the travis log. I suspect they might help debug further.

jnewbery commented 6 years ago

@MarcoFalke , here's the log from https://travis-ci.org/bitcoin/bitcoin/jobs/379941758, failing in mempool_persist.py:

test  2018-05-16T22:34:40.352000Z TestFramework (DEBUG): Stop-start node0 with -persistmempool=0. Verify that it doesn't load its mempool.dat file. 
 test  2018-05-16T22:34:40.352000Z TestFramework.node0 (DEBUG): Stopping node 
 node0 2018-05-16T22:34:40.352576Z Received a POST request for / from 127.0.0.1:32826 
 node0 2018-05-16T22:34:40.352707Z ThreadRPCServer method=stop user=__cookie__ 
 node0 2018-05-16T22:34:40.352776Z Interrupting HTTP server 
 node0 2018-05-16T22:34:40.352818Z Interrupting HTTP RPC server 
 node0 2018-05-16T22:34:40.352828Z Interrupting RPC 
 node0 2018-05-16T22:34:40.352845Z Shutdown: In progress... 
 node0 2018-05-16T22:34:40.352861Z opencon thread exit 
 node0 2018-05-16T22:34:40.352867Z Stopping HTTP RPC server 
 node0 2018-05-16T22:34:40.352880Z Unregistering HTTP handler for / (exactmatch 1) 
 node0 2018-05-16T22:34:40.352894Z addcon thread exit 
 node0 2018-05-16T22:34:40.352903Z Unregistering HTTP handler for /wallet/ (exactmatch 0) 
 node0 2018-05-16T22:34:40.352917Z Stopping RPC 
 test  2018-05-16T22:34:40.353000Z TestFramework (ERROR): Unexpected exception caught during testing
Traceback (most recent call last):
  File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/test_framework/authproxy.py", line 107, in _request
    return self._get_response()
  File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/test_framework/authproxy.py", line 153, in _get_response
    http_response = self.__conn.getresponse()
  File "/usr/lib/python3.4/http/client.py", line 1208, in getresponse
    response.begin()
  File "/usr/lib/python3.4/http/client.py", line 380, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.4/http/client.py", line 350, in _read_status
    raise BadStatusLine(line)
http.client.BadStatusLine: ''
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/test_framework/test_framework.py", line 152, in main
    self.run_test()
  File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/mempool_persist.py", line 85, in run_test
    self.stop_nodes()
  File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/test_framework/test_framework.py", line 301, in stop_nodes
    node.stop_node()
  File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/test_framework/test_node.py", line 196, in stop_node
    self.stop()
  File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/test_framework/coverage.py", line 47, in __call__
    return_val = self.auth_service_proxy_instance.__call__(*args, **kwargs)
  File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/test_framework/authproxy.py", line 136, in __call__
    response = self._request('POST', self.__url.path, postdata.encode('utf-8'))
  File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/test_framework/authproxy.py", line 111, in _request
    self.__conn.request(method, path, postdata, headers)
  File "/usr/lib/python3.4/http/client.py", line 1125, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python3.4/http/client.py", line 1163, in _send_request
    self.endheaders(body)
  File "/usr/lib/python3.4/http/client.py", line 1121, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python3.4/http/client.py", line 951, in _send_output
    self.send(msg)
  File "/usr/lib/python3.4/http/client.py", line 886, in send
    self.connect()
  File "/usr/lib/python3.4/http/client.py", line 863, in connect
    self.timeout, self.source_address)
  File "/usr/lib/python3.4/socket.py", line 512, in create_connection
    raise err
  File "/usr/lib/python3.4/socket.py", line 503, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused 
 node0 2018-05-16T22:34:40.353006Z RPC stopped. 
 node0 2018-05-16T22:34:40.353022Z Stopping HTTP server 
 node0 2018-05-16T22:34:40.353078Z Waiting for HTTP worker threads to exit 
 node0 2018-05-16T22:34:40.353136Z msghand thread exit 
 node0 2018-05-16T22:34:40.353169Z Waiting for HTTP event thread to exit 
 node0 2018-05-16T22:34:40.353213Z Exited http event loop 
 node0 2018-05-16T22:34:40.353323Z Stopped HTTP server 
 node0 2018-05-16T22:34:40.353351Z BerkeleyEnvironment::Flush: Flush(false) 
 node0 2018-05-16T22:34:40.353368Z BerkeleyEnvironment::Flush: Flushing wallet.dat (refcount = 0)... 
 node0 2018-05-16T22:34:40.353469Z BerkeleyEnvironment::Flush: wallet.dat checkpoint 
 node0 2018-05-16T22:34:40.353714Z net thread exit 
 node0 2018-05-16T22:34:40.354198Z BerkeleyEnvironment::Flush: wallet.dat detach 
 node0 2018-05-16T22:34:40.355314Z BerkeleyEnvironment::Flush: wallet.dat closed 
 node0 2018-05-16T22:34:40.355334Z BerkeleyEnvironment::Flush: Flush(false) took               2ms 
 node0 2018-05-16T22:34:40.358379Z Flushed 0 addresses to peers.dat  3ms 
 node0 2018-05-16T22:34:40.358545Z scheduler thread interrupt 
 node0 2018-05-16T22:34:40.361186Z Dumped mempool: 1.8e-05s to copy, 0.002403s to dump 
 node0 2018-05-16T22:34:40.361215Z Recorded 5 unconfirmed txs from mempool in 1.5e-05s 
 test  2018-05-16T22:34:40.362000Z TestFramework (INFO): Stopping nodes 
 test  2018-05-16T22:34:40.362000Z TestFramework.node0 (DEBUG): Stopping node 
 test  2018-05-16T22:34:40.363000Z TestFramework.node0 (ERROR): Unable to stop node.
Traceback (most recent call last):
  File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/test_framework/test_node.py", line 196, in stop_node
    self.stop()
  File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/test_framework/coverage.py", line 47, in __call__
    return_val = self.auth_service_proxy_instance.__call__(*args, **kwargs)
  File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/test_framework/authproxy.py", line 136, in __call__
    response = self._request('POST', self.__url.path, postdata.encode('utf-8'))
  File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/test_framework/authproxy.py", line 106, in _request
    self.__conn.request(method, path, postdata, headers)
  File "/usr/lib/python3.4/http/client.py", line 1125, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python3.4/http/client.py", line 1153, in _send_request
    self.putrequest(method, url, **skips)
  File "/usr/lib/python3.4/http/client.py", line 993, in putrequest
    raise CannotSendRequest(self.__state)
http.client.CannotSendRequest: Request-sent 
 test  2018-05-16T22:34:40.363000Z TestFramework.node1 (DEBUG): Stopping node 
 test  2018-05-16T22:34:40.364000Z TestFramework.node2 (DEBUG): Stopping node 
maflcko commented 6 years ago

@jnewbery I'd suggest to revisit this issue report after #13215, which runs the functional tests through docker exec on a bionic container. Bionic uses python3.6 and tests still failing with this error would rule out a bug that is only present in python3.4

maflcko commented 6 years ago

I believe the exception on python 3.6 is http.client.RemoteDisconnected: Remote end closed connection without response. So this is likely not an issue with travis or python 3.4.

laanwj commented 6 years ago

I think this is caused by a race during shutdown of the http server, causing it to not succesfully return the JSON response to stop to the caller. See also #13211, which more reliably reproduces the issue.

ken2812221 commented 6 years ago

@laanwj So the expected action is waiting for RPC client to get response from RPC server? Even if the client does tend to not get any response. I think the server should close ASAP even when the client does not want to get response.

maflcko commented 6 years ago

RPC client to get response from RPC server

Yes, all responses should be sent out unless you sent the corresponding request after the stop

jnewbery commented 6 years ago

Do people consider it a bug that bitcoind won't always respond to a stop call (and presumably an encryptwallet call) before shutting down?

If so, we should fix that bug in bitcoind. If not, we can fix the test failures by adding special casing for calls to stop and encryptwallet so that the test framework handles not receiving a response.

laanwj commented 6 years ago

Do people consider it a bug that bitcoind won't always respond to a stop call (and presumably an encryptwallet call) before shutting down?

It was always considered a bug, at least. So not regarding it as a bug is a change of behavior. I'm also not sure how bitcoin-cli copes with this.

maflcko commented 6 years ago

If this is not a bug, the documentation on help stop should clearly state that the remote might disconnect without sending a reply.