bitcoindevkit / bdk-ffi

Please consider this project *experimental*. But we hope to have an official release out soon.
Other
85 stars 37 forks source link

A few concurrency-related issues Block has faced while using bdk-ffi #205

Closed cgarrett-square closed 3 months ago

cgarrett-square commented 1 year ago

At Block, We're using the BDK via the bdk-ffi bindings in our iOS and Android Bitcoin Wallet. It's been incredibly helpful so far! We surfaced a few issues related to synchronicity in the BDK #mobile Discord channel and are creating this issue to track them. As discussed in the thread, I'm starting with a single issue with multiple related topics. But I'm happy to break this into three separate GitHub issues if the team prefers! Also, I'm acknowledging I'm not using the issue template until we've decided if this should be a single issue or three. But I'm happy to provide the additional detail requested in the template when helpful.

Issue 1: Concurrent Calls to Wallet in BDK

If we make calls concurrently to the BDK wallet, we sometimes experience the main thread hanging on a mutex lock. The calls involved in one example were sync and get_balance. Below is a sample stack trace from an app hang when we were calling into the BDK from multiple threads: get_balance on Thread 1 (main) and sync on Thread 6 (background):

block-mutex-wait

We changed our implementation to ensure all calls made to the BDK wallet are serialized, which fixes the issue. But we worry about performance implications with this solution. We call sync often, and have also observed that call hanging and not returning (separate issue below). So, if our calls are forced to be serialized, we could end up in a situation where we have to wait on a call to sync (that may never return) to finish before executing some other work.

Issue 2: Long-running / Non-cancellable Calls

In our development, we have encountered calls to sync that never terminated or returned. Examining the long-running process usually shows work within the BDK related to Electrum. There is not a way to cancel an in-flight call in BDK, which causes issues when / if this happens. On Android we tried to wrap blocking BDK calls into runInterruptible to make them cancellable, which seems to do the trick. We additionally tried to wrap the calls with withTimeout to prevent them from hanging for too long but that doesn’t work. We also have concerns about resources/memory leaks when wrapping blocking calls into cancellable coroutines without being able to properly clean up those calls.

// TODO: Timeout doesn't seem to be able to properly cancel the coroutine with blocking BDK call.
withTimeout(timeout = 5.seconds) {
  // Allows us to cancel these calls by killing coroutine scope.
  // TODO: This potentially can cause memory/resource leaks. Ideally BDK calls would be wrapped into Closeable.
  runInterruptible {
    bdkWallet.sync(
      blockchain = blockchain,
      progress = null,
    )
  }
}

Issue 3: Synchronous API

All methods in the BDK API are synchronous, but many are expensive and often take between hundreds of milliseconds to multiple seconds. We are currently manually wrapping the BDK API with our own asynchronous API. It would be nice if the BDK API could adopt an async or callback model to reduce the overhead of using it.

thunderbiscuit commented 1 year ago

Just linking some uniffi-rs threads related to asynchronicity here:

notmandatory commented 1 year ago

I did a little research on issue 1 and unfortunately in our current API access to Wallet needs to be serialized with a Mutex because 1) uniffi-rs doesn't currently support async calls, and 2) blockchain syncing and database access (ie for checking the balance) can't be shared between threads. This is one of the original issues that triggered the bdk_core API changes, which was to decouple the syncing process from the wallet database, Lloyd does a good job explaining it in his blog post.

Short term (Q4 2022) I think there's a couple things we can do, first we need to figure out why your electrum sync is taking so long, is the wallet you're syncing against very large in terms of addresses and/or transactions? Next it'd be good to know how you're syncing (normal IP or Tor?), and what kind of electrum server you're syncing against (default electrs?). And finally what does your ElectrumBlockchainConfig look like? It might help to lower the timeout and/or retry values but first we need to figure out why it's slow or not returning.

Long term (Q1 2023) we should have the bdk_core 1.0 API work merged and can update the bdk_ffi API to allow incremental syncing and also be able to do syncing and accessing the wallet DB in different threads.

In the longer term there is an open issue on uniffi-rs to expose async functions directly in the target languages. But it's a tricky issue since they support multiple languages that all work a bit differently. But for now once we're able to reproduce this issue I think we'll be able to find a solution with all the threading happening in Kotlin (or Swift).

allisonmoyer commented 1 year ago

what does your ElectrumBlockchainConfig look like?

Here's the iOS implementation right now:

BlockchainConfig.electrum(
        config: .init(
            url: "ssl://electrum.nodes.wallet.build:51002",
            socks5: nil,
            retry: 5,
            timeout: nil,
            stopGap: 10
        )
    )

@quad @0xarvin could you help answer some of the above questions WRT our electrum setup?

Next it'd be good to know how you're syncing (normal IP or Tor?)

what kind of electrum server you're syncing against (default electrs?)

I haven't tried to reproduce in a while, but the wallets we were seeing issues on didn't have particularly large numbers of addresses or transactions. We are also polling for sync every 5 seconds, but only if there's not a current sync already in progress.

quad commented 1 year ago

is the wallet you're syncing against very large in terms of addresses and/or transactions?

No. They're newly created wallet with only a few (less than 10) transactions.

Next it'd be good to know how you're syncing (normal IP or Tor?)

Normal IP.

and what kind of electrum server you're syncing against (default electrs?)

Fulcrum.

notmandatory commented 1 year ago

Has anyone compared performance of Fulcrum vs the Blockstream Esplora version of electrs? It drives some pretty big public servers so must be pretty efficient.

notmandatory commented 1 year ago

Here's some very interesting performance analysis of ElectrumX vs Fulcrum vs Esplora: "Electrum server performance report (2022)" by @jlopp

allisonmoyer commented 1 year ago

Thanks! Will circulate these back with the team.

Somewhat related to these issues is the issue of the sync progress never being called. Is that something you'd heard before? Or should I create a separate issue for it?

notmandatory commented 1 year ago

I have an open issue for the sync progress problem but we put the fix on hold since the 1.0 sync will fix the problem (in a different way). But a simple fix is possible for the pre-1.0 syncing, so if it's something you need sooner please add a comment on this issue: bitcoindevkit/bdk#798

allisonmoyer commented 1 year ago

Thanks for following up @notmandatory! Do you know the timeline for the v1 release?

notmandatory commented 1 year ago

Hi, the original goal for bdk_core and a 1.0 release was end of 2022 😅 .. the scope as grown a bit but I'm hoping it can be ready for release in Q1. I need to check with @LLFourn and @evanlinjin and see when they think it will be ready to merge their current staging branch (https://github.com/LLFourn/bdk_core_staging/) into the main bdk master branch. Then we can put all focus on testing and released candidates until it's ready both for rust and bdk-ffi users.

allisonmoyer commented 1 year ago

Got it, thanks for the context 👍

LLFourn commented 1 year ago

We've got a timeline we're going to try and stick to: https://github.com/LLFourn/bdk_core_staging#development-and-release-plan with the first release of bdk_chain on Friday.

bdk_chain v0.1.0 will be an early view of the API we plan to solve all the problems you've identified here. bdk v1.0.0-alpha.0 will wrap it up in a more easy to use form with automatic persistence (as current BDK provides).

I'll leave to @notmandatory as to what gets FFI bindings and when :)

thunderbiscuit commented 3 months ago

I'm coming back to this and see that most issues here will be solved by using the 1.0 API, or require new/updated issues to discuss them further.

Bitkey team feel free to reopen if you feel like you'd rather continue the discussion here! The 2 final things I think the bindings need before being ready for testing/prod is (a) simple sync/full_scan, and (b) SQLite store. But both of those are partially solved right now in the alpha releases (flat file storage and slightly more complex sync), so they're ready to start experimenting with!