Open evanlinjin opened 2 years ago
I have been witnessing few occasional failures in blockchain tests in my local too.. Seems like some kind of resource blocking thingy happening somewhere?? I was seeing "resource unavailable" errors few days ago when running blockchian tests all together.. Which I can't seem to reproduce anymore.. Could be something related??
From the error message it seems something got timed out while waiting for some stuffs..
I have been witnessing few occasional failures in blockchain tests in my local too.. Seems like some kind of resource blocking thingy happening somewhere?? I was seeing "resource unavailable" errors few days ago when running blockchian tests all together.. Which I can't seem to reproduce anymore.. Could be something related??
From the error message it seems something got timed out while waiting for some stuffs..
Same here. Would it make sense to implement some retry logic/exponential backoff?
We can keep track of the errors and where they occur here.
Same here. Would it make sense to implement some retry logic/exponential backoff?
We do have an exponential pole at electrum calls https://github.com/bitcoindevkit/bdk/blob/063d51fd7587815c733f24365a4de02fed072903/src/testutils/blockchain_tests.rs#L345-L359
To me it seems for some reason the electrum calls isn't returning and some resources are not being free??
The error seems to disappear in single test runs.. Only when I run all the blockchian tests together..
This looks like a bug in electrsd:
[2022-07-02T18:30:55Z ERROR electrsd] early exit with: ExitStatus(unix_wait_status(134))
Maybe it doesn't handle too much load well. I'll see if I can somehow attach a debugger and reproduce.
This is a one-off failure, not sure if it is related.
failures:
---- blockchain::esplora::bdk_blockchain_tests::test_double_spend stdout ----
thread 'blockchain::esplora::bdk_blockchain_tests::test_double_spend' panicked at 'called `Result::unwrap()` on an `Err` value: JSON(Error("expected value", line: 1, column: 1))', src/testutils/blockchain_tests.rs:65:56
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
failures:
blockchain::esplora::bdk_blockchain_tests::test_double_spend
Maybe it's related to this? https://github.com/bitcoindevkit/rust-electrum-client/issues/67
failures:
---- blockchain::esplora::bdk_blockchain_tests::test_sync_address_index_should_not_decrement stdout ----
thread 'blockchain::esplora::bdk_blockchain_tests::test_sync_address_index_should_not_decrement' panicked at 'called `Result::unwrap()` on an `Err` value: EarlyExit(ExitStatus(unix_wait_status(134)))', src/testutils/blockchain_tests.rs:36:75
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
failures:
blockchain::esplora::bdk_blockchain_tests::test_sync_address_index_should_not_decrement
test result: FAILED. 256 passed; 1 failed; 5 ignored; 0 measured; 0 filtered out; finished in 6.41s
failures:
error: test failed, to rerun pass '--lib'
---- blockchain::esplora::bdk_blockchain_tests::test_tx_chain stdout ----
thread 'blockchain::esplora::bdk_blockchain_tests::test_tx_chain' panicked at 'called `Result::unwrap()` on an `Err` value: EarlyExit(ExitStatus(unix_wait_status(134)))', src/testutils/blockchain_tests.rs:36:75
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
failures:
blockchain::esplora::bdk_blockchain_tests::test_tx_chain
---- blockchain::electrum::test::bdk_blockchain_tests::test_sync_bump_fee_basic stdout ----
thread 'blockchain::electrum::test::bdk_blockchain_tests::test_sync_bump_fee_basic' panicked at 'called `Result::unwrap()` on an `Err` value: IOError(Os { code: 104, kind: ConnectionReset, message: "Connection reset by peer" })', src/testutils/blockchain_tests.rs:65:56
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
This one is the craziest yet: https://gist.github.com/evanlinjin/3079251115e64edc355f46d0d63edb1e
Yes.. I have been seeing this one in my local..
Yes.. I have been seeing this one in my local..
I mean if the error states "Resource TEMPORARILY unavailable"
, I think we can safely wrap the calls with exponential back-off logic. I did see a function in the tests for exponential back-off, so maybe we can reuse that.
My guess: the tcp send buffer gets filled up because core can't process all those requests at once. Increasing the number of threads would probably help.
Source https://linux.die.net/man/2/send:
When the message does not fit into the send buffer of the socket, send() normally blocks, unless the socket has been placed in nonblocking I/O mode. In nonblocking mode it would fail with the error EAGAIN or EWOULDBLOCK in this case
I ran into this issue testing bitcoindevkit/bdk-cli#102 with the new regtest-bitcoin
feature which starts up a local bitcoind and connects to it via the RPC client, but the error only seemed to happen on my MacOS system, the CI tests all pass on Ubuntu Linux. While debugging I noticed in the bitcoind logs a list of RPC requests to "Update existing descriptor: raw(...)" seems to be the default 100 address look ahead, so I think the requests are all getting successfully delivered but the RPC client connection disconnects always after 5 seconds and throws the SocketError "Resource temporarily unavailable". I'll try making the bitcoind server responded faster (as @afilini suggested with more threads), and separately see if I can make the RPC client timeout longer and report back.
I re-tried my test with more rpcthreads set on bitcoind with conf set to via "-rpcthread=
Probably best to disregard my above RPC problems since the title of this issue is really about esplora. I'll open a new issue for my RPC macos problem.
Could this issue be related to the default number for file descriptors running out? This is mentioned for core here: JSON-RPC-interface limitations.
As per https://docs.oracle.com/cd/E19476-01/821-0505/file-descriptor-requirements.html, the default limit for file descriptors per process is 1024. I am not fully sure, but I think we are not hitting that many RPC requests.
Another pointer from LDK team was, it should just redo the call again if it hits the error. https://github.com/lightningdevkit/rust-lightning/pull/2033#issuecomment-1432018010
I will try with this on the new RPC example and report back..
Describe the bug
CI:
blockchain::esplora::bdk_blockchain_tests::test_sync_stop_gap_20
occationally fails.The command that fails
Occurences