Closed maflcko closed 5 years ago
Saw this with importprunedfunds
in https://travis-ci.org/bitcoin/bitcoin/jobs/327693845. Here's the stack trace in case it it might help people find this issue:
stderr:
Traceback (most recent call last):
File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/test_framework/authproxy.py", line 107, in _request
return self._get_response()
File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/test_framework/authproxy.py", line 153, in _get_response
http_response = self.__conn.getresponse()
File "/usr/lib/python3.4/http/client.py", line 1208, in getresponse
response.begin()
File "/usr/lib/python3.4/http/client.py", line 380, in begin
version, status, reason = self._read_status()
File "/usr/lib/python3.4/http/client.py", line 350, in _read_status
raise BadStatusLine(line)
http.client.BadStatusLine: ''
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/importprunedfunds.py", line 114, in <module>
ImportPrunedFundsTest().main()
File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/test_framework/test_framework.py", line 141, in main
self.stop_nodes()
File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/test_framework/test_framework.py", line 258, in stop_nodes
node.stop_node()
File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/test_framework/test_node.py", line 124, in stop_node
self.stop()
File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/test_framework/coverage.py", line 47, in __call__
return_val = self.auth_service_proxy_instance.__call__(*args, **kwargs)
File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/test_framework/authproxy.py", line 136, in __call__
response = self._request('POST', self.__url.path, postdata.encode('utf-8'))
File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/test_framework/authproxy.py", line 111, in _request
self.__conn.request(method, path, postdata, headers)
File "/usr/lib/python3.4/http/client.py", line 1125, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python3.4/http/client.py", line 1163, in _send_request
self.endheaders(body)
File "/usr/lib/python3.4/http/client.py", line 1121, in endheaders
self._send_output(message_body)
File "/usr/lib/python3.4/http/client.py", line 951, in _send_output
self.send(msg)
File "/usr/lib/python3.4/http/client.py", line 886, in send
self.connect()
File "/usr/lib/python3.4/http/client.py", line 863, in connect
self.timeout, self.source_address)
File "/usr/lib/python3.4/socket.py", line 512, in create_connection
raise err
File "/usr/lib/python3.4/socket.py", line 503, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused
I'm also seeing this on my BIP 174 test here: https://travis-ci.org/bitcoin/bitcoin/jobs/327867319. It appears to be happening consistently though, so I'm not sure if that's a test problem or a Travis problem.
This is a bug in the python-bitcoinrpc library - I get it all the time locally, something to do with the underlying HTTP request getting closed and not automatically reopened. In my own scripts I generally wrap every call and retry if it falls.
On January 12, 2018 12:45:15 AM UTC, Andrew Chow notifications@github.com wrote:
I'm also seeing this on my BIP 174 test here: https://travis-ci.org/bitcoin/bitcoin/jobs/327867319. It appears to be happening consistently though, so I'm not sure if that's a test problem or a Travis problem.
-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/bitcoin/bitcoin/issues/11777#issuecomment-357109147
@achow101 - I think your error is different. bitcoind is OOM'ing: https://travis-ci.org/bitcoin/bitcoin/jobs/327867319#L6571
All the other examples are when the stop
RPC is called. authproxy.py
has handling for if the RPC call raises a http.client.BadStatusLine
because the connection was closed:
It tries to reconnect and send the RPC again. I guess if the stop
RPC is successful but causes the connection to be closed before the response is received, then authproxy won't be able to reconnect and we'll get the subsequent ConnectionRefusedError
.
Shutdown processing was changed here: https://github.com/bitcoin/bitcoin/pull/11006. I wonder if that's made this worse. I guess we see it on travis because it's a timing thing.
@TheBlueMatt - you're able to reproduce this locally. Can you check whether reverting #11006 improves this for you?
We could fix this by special casing calls to stop
, and wrapping them in a try-except (which would check that the process had indeed stopped and then continue).
We could fix this by special casing calls to stop, and wrapping them in a try-except (which would check that the process had indeed stopped and then continue).
Imo, we should properly send the response before cutting the connection and going silent.
Err, no, I've never seen the issue when running our test suite, only when using python-bitcoinrpc locally against a local bitcoind for other stuff.
On January 15, 2018 3:59:49 PM UTC, John Newbery notifications@github.com wrote:
@achow101 - I think your error is different. bitcoind is OOM'ing: https://travis-ci.org/bitcoin/bitcoin/jobs/327867319#L6571
All the other examples are when the
stop
RPC is called.authproxy.py
has handling for if the RPC call raises ahttp.client.BadStatusLine
because the connection was closed:It tries to reconnect and send the RPC again. I guess if the
stop
RPC is successful but causes the connection to be closed before the response is received, then authproxy won't be able to reconnect and we'll get the subsequentConnectionRefusedError
.Shutdown processing was changed here: https://github.com/bitcoin/bitcoin/pull/11006. I wonder if that's made this worse. I guess we see it on travis because it's a timing thing.
@TheBlueMatt - you're able to reproduce this locally. Can you check whether reverting #11006 improves this for you?
We could fix this by special casing calls to
stop
, and wrapping them in a try-except (which would check that the process had indeed stopped and then continue).-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/bitcoin/bitcoin/issues/11777#issuecomment-357722916
@TheBlueMatt I assume you are using python-bitcoinrpc:1.0 and python:3.4, which will incorrectly throw BadStatusLine
due to a bug in python (c.f. https://bugs.python.org/issue3566).
In case my assumption is wrong, do you see this exception also for rpcs other than stop
?
So I tried running with the depends build (i.e. libevent 2.1.8-stable) and python:3.4.3 from trusty. So all of that should be identical to what travis is running, but I couldn't reproduce the issue.
Since we print the consolidated logs on failure on travis, someone should copy-paste them here before wiping the travis log. I suspect they might help debug further.
@MarcoFalke , here's the log from https://travis-ci.org/bitcoin/bitcoin/jobs/379941758, failing in mempool_persist.py:
test 2018-05-16T22:34:40.352000Z TestFramework (DEBUG): Stop-start node0 with -persistmempool=0. Verify that it doesn't load its mempool.dat file.
test 2018-05-16T22:34:40.352000Z TestFramework.node0 (DEBUG): Stopping node
node0 2018-05-16T22:34:40.352576Z Received a POST request for / from 127.0.0.1:32826
node0 2018-05-16T22:34:40.352707Z ThreadRPCServer method=stop user=__cookie__
node0 2018-05-16T22:34:40.352776Z Interrupting HTTP server
node0 2018-05-16T22:34:40.352818Z Interrupting HTTP RPC server
node0 2018-05-16T22:34:40.352828Z Interrupting RPC
node0 2018-05-16T22:34:40.352845Z Shutdown: In progress...
node0 2018-05-16T22:34:40.352861Z opencon thread exit
node0 2018-05-16T22:34:40.352867Z Stopping HTTP RPC server
node0 2018-05-16T22:34:40.352880Z Unregistering HTTP handler for / (exactmatch 1)
node0 2018-05-16T22:34:40.352894Z addcon thread exit
node0 2018-05-16T22:34:40.352903Z Unregistering HTTP handler for /wallet/ (exactmatch 0)
node0 2018-05-16T22:34:40.352917Z Stopping RPC
test 2018-05-16T22:34:40.353000Z TestFramework (ERROR): Unexpected exception caught during testing
Traceback (most recent call last):
File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/test_framework/authproxy.py", line 107, in _request
return self._get_response()
File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/test_framework/authproxy.py", line 153, in _get_response
http_response = self.__conn.getresponse()
File "/usr/lib/python3.4/http/client.py", line 1208, in getresponse
response.begin()
File "/usr/lib/python3.4/http/client.py", line 380, in begin
version, status, reason = self._read_status()
File "/usr/lib/python3.4/http/client.py", line 350, in _read_status
raise BadStatusLine(line)
http.client.BadStatusLine: ''
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/test_framework/test_framework.py", line 152, in main
self.run_test()
File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/mempool_persist.py", line 85, in run_test
self.stop_nodes()
File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/test_framework/test_framework.py", line 301, in stop_nodes
node.stop_node()
File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/test_framework/test_node.py", line 196, in stop_node
self.stop()
File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/test_framework/coverage.py", line 47, in __call__
return_val = self.auth_service_proxy_instance.__call__(*args, **kwargs)
File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/test_framework/authproxy.py", line 136, in __call__
response = self._request('POST', self.__url.path, postdata.encode('utf-8'))
File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/test_framework/authproxy.py", line 111, in _request
self.__conn.request(method, path, postdata, headers)
File "/usr/lib/python3.4/http/client.py", line 1125, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python3.4/http/client.py", line 1163, in _send_request
self.endheaders(body)
File "/usr/lib/python3.4/http/client.py", line 1121, in endheaders
self._send_output(message_body)
File "/usr/lib/python3.4/http/client.py", line 951, in _send_output
self.send(msg)
File "/usr/lib/python3.4/http/client.py", line 886, in send
self.connect()
File "/usr/lib/python3.4/http/client.py", line 863, in connect
self.timeout, self.source_address)
File "/usr/lib/python3.4/socket.py", line 512, in create_connection
raise err
File "/usr/lib/python3.4/socket.py", line 503, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused
node0 2018-05-16T22:34:40.353006Z RPC stopped.
node0 2018-05-16T22:34:40.353022Z Stopping HTTP server
node0 2018-05-16T22:34:40.353078Z Waiting for HTTP worker threads to exit
node0 2018-05-16T22:34:40.353136Z msghand thread exit
node0 2018-05-16T22:34:40.353169Z Waiting for HTTP event thread to exit
node0 2018-05-16T22:34:40.353213Z Exited http event loop
node0 2018-05-16T22:34:40.353323Z Stopped HTTP server
node0 2018-05-16T22:34:40.353351Z BerkeleyEnvironment::Flush: Flush(false)
node0 2018-05-16T22:34:40.353368Z BerkeleyEnvironment::Flush: Flushing wallet.dat (refcount = 0)...
node0 2018-05-16T22:34:40.353469Z BerkeleyEnvironment::Flush: wallet.dat checkpoint
node0 2018-05-16T22:34:40.353714Z net thread exit
node0 2018-05-16T22:34:40.354198Z BerkeleyEnvironment::Flush: wallet.dat detach
node0 2018-05-16T22:34:40.355314Z BerkeleyEnvironment::Flush: wallet.dat closed
node0 2018-05-16T22:34:40.355334Z BerkeleyEnvironment::Flush: Flush(false) took 2ms
node0 2018-05-16T22:34:40.358379Z Flushed 0 addresses to peers.dat 3ms
node0 2018-05-16T22:34:40.358545Z scheduler thread interrupt
node0 2018-05-16T22:34:40.361186Z Dumped mempool: 1.8e-05s to copy, 0.002403s to dump
node0 2018-05-16T22:34:40.361215Z Recorded 5 unconfirmed txs from mempool in 1.5e-05s
test 2018-05-16T22:34:40.362000Z TestFramework (INFO): Stopping nodes
test 2018-05-16T22:34:40.362000Z TestFramework.node0 (DEBUG): Stopping node
test 2018-05-16T22:34:40.363000Z TestFramework.node0 (ERROR): Unable to stop node.
Traceback (most recent call last):
File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/test_framework/test_node.py", line 196, in stop_node
self.stop()
File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/test_framework/coverage.py", line 47, in __call__
return_val = self.auth_service_proxy_instance.__call__(*args, **kwargs)
File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/test_framework/authproxy.py", line 136, in __call__
response = self._request('POST', self.__url.path, postdata.encode('utf-8'))
File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/test_framework/authproxy.py", line 106, in _request
self.__conn.request(method, path, postdata, headers)
File "/usr/lib/python3.4/http/client.py", line 1125, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python3.4/http/client.py", line 1153, in _send_request
self.putrequest(method, url, **skips)
File "/usr/lib/python3.4/http/client.py", line 993, in putrequest
raise CannotSendRequest(self.__state)
http.client.CannotSendRequest: Request-sent
test 2018-05-16T22:34:40.363000Z TestFramework.node1 (DEBUG): Stopping node
test 2018-05-16T22:34:40.364000Z TestFramework.node2 (DEBUG): Stopping node
@jnewbery I'd suggest to revisit this issue report after #13215, which runs the functional tests through docker exec on a bionic container. Bionic uses python3.6 and tests still failing with this error would rule out a bug that is only present in python3.4
I believe the exception on python 3.6 is http.client.RemoteDisconnected: Remote end closed connection without response
. So this is likely not an issue with travis or python 3.4.
I think this is caused by a race during shutdown of the http server, causing it to not succesfully return the JSON response to stop
to the caller.
See also #13211, which more reliably reproduces the issue.
@laanwj So the expected action is waiting for RPC client to get response from RPC server? Even if the client does tend to not get any response. I think the server should close ASAP even when the client does not want to get response.
RPC client to get response from RPC server
Yes, all responses should be sent out unless you sent the corresponding request after the stop
Do people consider it a bug that bitcoind won't always respond to a stop
call (and presumably an encryptwallet
call) before shutting down?
If so, we should fix that bug in bitcoind.
If not, we can fix the test failures by adding special casing for calls to stop
and encryptwallet
so that the test framework handles not receiving a response.
Do people consider it a bug that bitcoind won't always respond to a stop call (and presumably an encryptwallet call) before shutting down?
It was always considered a bug, at least.
So not regarding it as a bug is a change of behavior.
I'm also not sure how bitcoin-cli
copes with this.
If this is not a bug, the documentation on help stop
should clearly state that the remote might disconnect without sending a reply.
Unrelated to any specific test, randomly a BadStatusLine might raise when reading the response.
This only happens on travis, so assigning low priority.
Recent examples: