bitcoindevkit / rust-electrum-client

Bitcoin Electrum client library. Supports plaintext, TLS and Onion servers.
MIT License
77 stars 63 forks source link

Transactions being returned out of order #75

Open 0xarvin opened 2 years ago

0xarvin commented 2 years ago

Via BDK, we're pulling transactions for a Wallet through a signet electrum instance. Recently, we've noticed that the order that we call batch_transaction_get_raw isn't being respected by electrum and results in the transactions being returned by the client to also be out of order as the reader takes the top result if it's unable to grab a hold of the lock on the reader

This doesn't affect production calls in BDK but only in tests as there's a debug assert here. Was wondering if there's anything that I'm missing or if this is a known issue.

Wallet Descriptor for example: wsh(sortedmulti(2,[e5cd860e/84'/1'/0']tpubDFVZtuMvX7n6DCWbNAgq361KiJdVFBy4D1sZmdafF1zDNW6dv79MwbK6JBRxGV42BS4STayPsG7Prn3QT7LQwzfaBhE8YhgCs4VzhkYVWTg/0/*,[73ee041b/84'/1'/0']tpubDEPCQp4KHW3sS5o18QbknuC6dtQJPFgWtS6e8wm6ewnc2B1vucCnKf9uf9gCddHeJoK5d5hgJv5ji5Q8VnpT4h9ydsi9FE4t8F2GoPrFPYa/0/*,[08a73043/84'/1'/0']tpubDEZ565bfePfqSVkSfrLaR7g5nruLDWxJFKkJvKWUpMEQBsS9p2f5um5uTwgmKWuHbyoiythmEvDRSgdEoYBgLXfosBsjMBTp1jev9wfi2q9/0/*

Network: Signet

afilini commented 2 years ago

I don't think this was a known issue, and I'm guessing this may be related to the specific server your are running because I've never seen this happen with electrs.

Looking at the code it's also very weird that this is happening: we do have code to re-order the replies from the server for batch requests:

https://github.com/bitcoindevkit/rust-electrum-client/blob/ca486285396bd3bf6ca426a127381699973c65a0/src/raw_client.rs#L706-L723

Either there's an issue there, or the server is replying with the wrong request id. Can you try running your tests with trace log level for this crate and then attaching the log here? From that log we should be able to see if the server replies are correct.

0xarvin commented 2 years ago

That's what I'm starting to think, but here's the trace. I replaced the electrum server but let me know if I should include that.

https://gist.github.com/0xarvin/9baee245a45e6e409d9be65e4a447101

notmandatory commented 1 year ago

Which electrum server software are you using where you see this issue?

0xarvin commented 1 year ago

@notmandatory We spun up a set of signet nodes:

url: ssl://electrum.nodes.wallet.build:51002 electrum server: fulcrum

conduition commented 9 months ago

+1 for this issue. Happens for many servers, not just one special guy. Most servers I tested exhibit this bug. Including:

ssl://fulcrum.grey.pw:51002
ssl://electrum.jochen-hoenicke.de:50099
tcp://v7gtzf7nua6hdmb2wtqaqioqmesdb4xrlly4zwr7bvayxv2bpg665pqd.onion:50001
tcp://venmrle3xuwkgkd42wg7f735l6cghst3sdfa3w3ryib2rochfhld6lid.onion:50001
ssl://blackie.c3-soft.com:57002
ssl://e.keff.org:50002
tcp://udfpzbte2hommnvag5f3qlouqkhvp3xybhlus2yvfeqdwlhjroe4bbyd.onion:60001
tcp://kittycp2gatrqhlwpmbczk5rblw62enrpo2rzwtkfrrr27hq435d4vid.onion:50001
ssl://horsey.cryptocowboys.net:50002

I can confirm the cause does NOT originate within this library.

The culprit electrum servers are not respecting JSON-RPC request IDs correctly. If i ask for a series of transactions with a given set of request IDs, the server will return responses for those request IDs but the actual response data might not be mapped to the correct response IDs.

E.g. request with ID 9 asks for TX abcd..., but the response with ID 9 instead contains TX beef... which was requested elsewhere in the batch under a different request ID.

I imagine this is a bug somewhere in the upstream servers' async request handling code. We could easily fix this inside this library by sorting the response TXs by their true TXID. However, if the servers aren't respecting JSON-RPC request IDs, it's hard to know what other properties of JSON-RPC they might be breaking. I didn't test any other batch methods to see if they suffered from the same problem as batch_transaction_get. I think more testing is required here.

The fact that so many servers are apparently broken in this respect is concerning and suggests that maybe a client-side hack is needed, otherwise callers will inevitably reopen issues like this in the future.

conduition commented 9 months ago

I can confirm that this problem is not limited to batched transaction fetching. It also affects batch_script_get_history. If I had to guess, i suspect afflicted servers just don't respect JSON-RPC request IDs for any methods.

johnzweng commented 1 week ago

Hi!

I can provide new data to this issue 🙂

I also stumbled about this problem and after some tests I think I can confirm that this is no server issue but must happen somewhere on the Rust side of things. Unfortunately my knowledge of Rust is still too limited to spot the error. 🙈

Here is what I tested:

I wrote a simple test program, to reproduce this error. It runs in a loop and every now and then the results are out of order:

extern crate electrum_client;
use bitcoin::block::Header;
use electrum_client::{Client, ElectrumApi};

fn main() {
    loop {
        let client = Client::new("tcp://bitcoin.aranguren.org:50001").unwrap();

        // random blockheights
        let heights: Vec<u32> = vec![1, 4, 8, 12, 222, 6666];
        let headers: Vec<Header> = client.batch_block_header(heights).unwrap();

        let mut err_counter = 0;
        if headers.get(0).unwrap().time != 1231469665 { // time of block 1
            err_counter += 1;
            println!("error 0");
        }
        if headers.get(1).unwrap().time != 1231470988 { // time of block 4
            err_counter += 1;
            println!("error 1");
        }
        if headers.get(2).unwrap().time != 1231472743 { // time of block 8
            err_counter += 1;
            println!("error 2");
        }
        if headers.get(3).unwrap().time != 1231474888 { // time of block 12
            err_counter += 1;
            println!("error 3");
        }
        if headers.get(4).unwrap().time != 1231770653 { // time of block 222
            err_counter += 1;
            println!("error 4");
        }
        if headers.get(5).unwrap().time != 1236456633 { // time of block 6666
            err_counter += 1;
            println!("error 5");
        }
        println!("Result: {} headers were not at expected index in result vector", err_counter);
        println!("==========================================")
    }
}

I also added some debug printlnhere in this block: https://github.com/bitcoindevkit/rust-electrum-client/blob/0b97659b383554357fa55df1e9770e61b1d84ba4/src/raw_client.rs#L66-L68

        for x in resp {
            println!("************************");
            println!("{:#?}", x);
            answer.push(serde_json::from_value(x)?);
        }

Additonally I started wireshark to monitor the traffic to the Electrum server (that's why I chose a clear text server).

Finally I set the breakpoint at the end of my loop, so that I could step through my loop and identify the corresponding TCP connection in my wireshark window. As soon a saw the error in the console output I looked into wireshark and displayed the last TCP connection with "Follow TCP stream".

And here is the result:

electrumX responses vs rust-client

I cannot explain why, but we definitely can see that the server returned the responses with their correct corresponding JSON rpc id's but here in this line in resp it ended up in a different order: https://github.com/bitcoindevkit/rust-electrum-client/blob/0b97659b383554357fa55df1e9770e61b1d84ba4/src/raw_client.rs#L63

Any ideas @afilini ? 🙂 (my rust knowledge is still too limited to spot the issue here)

johnzweng commented 1 week ago

As a short sidenote: This bug also impacts the BDK in some way and leads to incorrect (or at least inconsistent) data being synced into the wallet database.

In detail: During sync the .zip(...) call here in this line also assumes that the results are in the same order as the requested block heights.

If they are not this will lead to a situation where the result of wallet.list_transactions(...) will contain transactions where the BlockTime struct confirmation_time will contain inconsistent data, where the height does not match the timestamp (containing timestamp of a different block).

pub struct BlockTime {
    /// confirmation block height
    pub height: u32,
    /// confirmation block timestamp
    pub timestamp: u64,
}
notmandatory commented 1 week ago

@johnzweng can you try to reproduce this with the latest 1.0.0-beta.2 release? The syncing has seen major changes and if still broken we'll likely fix for the 1.0 version first.