Dajiban / digibytewallet-core

SPV bitcoin C library
MIT License
3 stars 6 forks source link

DigiByte Bug Bounty #1: Synchronizing with the DigiByte blockchain stalls periodically. #7

Closed gto90 closed 6 years ago

gto90 commented 6 years ago

Impact

Stalled blockchain synchronization prevents the sync progress indicator from progressing and new blocks and transactions from being added to the local wallet.

Components

The primary affected component is the digiwallet-core SPV library. Secondary affected components are the digibytewallet-android and digibytewallet-ios mobile clients.

Details

Generally, the blockchain synchronization behavior of the iOS or Android wallets is unremarkable. However, periodically the sync progress view will cease to make further progress causing the application to fail to retrieve new blocks and transactions. This incorrect, periodic behavior usually manifests itself after the initial wallet sync has completed, but not in all cases. It usually stalls after several days of not having opened the wallet or rescanning the blockchain from the application menu.

Reproduction Environments

There are two environments in which you can reproduce this issue:

iOS

In order to be able to reproduce this issue via the digibytewallet-ios client, you will need a suitable Apple computer capable of running the latest version of the Xcode IDE. Once you have your Xcode environment set up, you need to clone the following these steps:

  1. Clone the following GitHub repository: https://github.com/Racecraft/digibytewallet-ios
  2. Use cocoapods and run the pod install command.
  3. Open the xcworkspace file named breadwallet.xcworkspace
  4. Build and run the DigiWallet for iOS application on any iOS simulator you have configured.

Android

In order to be able to reproduce this issue via the digibytewallet-android client, you will need a computer with the latest version of Google’s Android Studio installed and configured. Once you have your Android Studio environment set up, you need to clone the following these steps:

  1. Clone the following GitHub repository: https://github.com/Racecraft/digibytewallet-android
  2. Open the project in Android Studio and allow Gradle to sync
  3. You can now build and run the DigiByte for Android application on any AVD you have configured using the Debug build variant.

Steps to Reproduce

To reproduce the issue please attempt all of the following:

  1. Create a new wallet
  2. Restore a wallet from a seed phrase
  3. Rescan the blockchain from the menu

Expected Behavior

Whenever a synchronization with the blockchain is happening, the user will be presented with a sync progress indicator view. This view should show steady and consistent progress with the date on the view increasing and the progress bar filling up.

Observed Behavior

Periodically, the sync progress indicator will stop updating and remain stalled on a given date. When this happens, the user's only recourse is to force quit the application and restart where the sync progress view will usually continue without issue. In some rare cases, sync will get into a state that requires the user to delete the application and reinstall forcing a fresh sync from either the checkpoint closest to the wallet creation date or the earliestKeyTime.

When the sync process stalls, one of two patterns can be observed in the respective environment's console logs. These patterns will loop indefinitely until the user force quits the application. No other method is known to recover once this application state is entered:

iOS

76.184.81.189:12024 connect error: Operation timed out
76.184.81.189:12024 disconnected
96.58.224.14:12024 connecting
84.26.82.96:12024 Connection reset by peer
84.26.82.96:12024 disconnected
84.26.82.96:12024 mempool request failed
94.130.69.23:12022 connecting

or

52.173.197.109:12024 got inv with 1 item(s)
104.209.37.175:12024 got inv with 1 item(s)
52.173.197.109:12024 got inv with 1 item(s)
104.209.37.175:12024 got inv with 1 item(s)
52.173.197.109:12024 got inv with 1 item(s)
104.209.37.175:12024 got inv with 1 item(s)
52.173.197.109:12024 got inv with 1 item(s)
104.209.37.175:12024 got inv with 1 item(s)

Bounty Award

The solution accepted by the DigiByte Core development team will be rewarded with 100,000 DGB.

Bounty Payout Process

SmartArray commented 6 years ago

Issue fixed on iOS: https://github.com/Racecraft/digibytewallet-ios/pull/129/commits/c03518db418c747d98ab98ed8ec12b3f82f6b9c6

I spent 8 hours (!!) figuring out what was going on. It was a deadlock in swift-code.

How to reproduce

Set PEER_MAX_CONNECTIONS to 1 (in file BRPeerManager.h). Synching works smoothly. As soon as PEER_MAX_CONNECTIONS is set to >= 2

obj_sync_enter()

becomes a point-of-no-return after a peer disconnected by accident. Since we only read from blockRefs, we can safely neglect synching the access. For those who wonder: SQLITE3 is set to multithreading-mode in this project. It handles concurrency.

Refer to https://stackoverflow.com/a/35131246 why it is happening.

dispatchsync*()

should not be called directly, even if you see it in several projects. Refer to swift guidelines.

https://straypixels.net/swift-dictionary-locking/ also gives an in-depth analysis, why it is critical to use

dispatchsync*()

especially because each dispatched Swift-thread has its own blockRefs-object. They all have different memory addresses, IMHO.

let blockRefs = [BRBlockRef?](UnsafeBufferPointer(start: blocks, count: blocksCount))

dispatchsync*()

is just useless in that case, if I am correct.

Debugging

I tested this solution a couple of times and synching finishes flawlessly. My android phone does not stall using the android app.

In case of approval

D9FTckJZ2wKHC3BvVxDccguRTy9UwLVZDw

Thank you very much for that wonderful project.

gto90 commented 6 years ago

Good evening @SmartArray ,

My most humble thanks for all of your hard work on this. Indeed it appears you have resolved a deadlock issue, but I am still able to reproduce the stalling sync. Can you try this for me:

Observed behavior

They sync progress view is presented with a starting date of Feb 20, 2018 at 3:37 PM and does not progress any further. In some cases, you can force quit the app and retry the rescan successfully. However, in most cases rescan will stall. Finally, force quitting and unsuccessfully running the rescan function can at times get the application into an unrecoverable state where no further syncing with the blockchain will succeed.

Expected behavior

Selecting the rescan function should immediately begin rescanning the blockchain from the checkpoint closest to the wallet creation date and the sync progress view should updated normally.

ChillingSilence commented 6 years ago

This issue manifests itself on Android if you were to restore using: loan patient inquiry alter define genuine hobby chicken cat praise general trade

You'll find initially that you can restore roughly 6hours per-second with this rapidly decreasing to around 1 hour per-second by the time it reaches December 15th 2017, and from there on out it slows until it eventually stalls completely.

Are you able to reproduce the issue on Android using the above restore code?

Liking the fix thus far for Android. If you would like to do some testing with us in realtime, please feel free to jump on Telegram: https://t.me/DigiByteDevelopers

SmartArray commented 6 years ago

@ChillingSilence Thanks for giving me an insight on this. I am going to debug the android application later. I've an idea what this is about.

First I am going to investigate on the iOS issue. Right at the moment, I can see the same (similar) behavior in the iOS app. I have decent 6 minutes per second after Dec, 2017.

ploenne commented 6 years ago

Hi everybody:

issue fixed here:

https://github.com/Racecraft/digibytewallet-ios/pull/130

and includes a fix/update on the digibytewallet-core:

github.com/Racecraft/digibytewallet-core/pull/8

i think the issue is a thread lock issue, while multiple peers sending the blocks. Due different network threads in can be happen two incoming 'inv' message accesing the database at the same time and led to not unlock the function.

So making persisting the incoming block thread save fix the issue.

Commented code and and not reseted display variable ('lastBlockHeight') causing ReScan issues.

Both issue fixed with this PR's.

The incoming message 'got inv with 1 item(s)' is just an anoucement of a node about a new Block ... For every node a message will display and anoucement a new block, so this is no bug.

Improved logging:

introducing a better logging with:

log_u256_hex_decode() instead of u256_hex_decode()

On Xcode / Mac / Litte Endian machines you now can simple copy the hash in the console and search for them in the block explorer. Easy to check whats the current block/transaction/blockheight is.

Debugging:

I wrote some tests and let them run often. I will clean up them and commite them as well.

I think we could improve the speed, to often multiple peers are requested for the same headers/block.

In case of approval:

DLgq9e1w4VURYvgcfHJTj8fS73h4NTwGbA

gto90 commented 6 years ago

@SmartArray and @ploenne , Between the two of you it seems like sync is so far not stalling at all for me on iOS. We will need to push out a TestFlight build to put it through the paces so if both of you could PM me in telegram an email I can use for TestFlight I'd like to get a test build in your hands.

Also, we'll still need to work to see what Android needs to also resolve it's sync issue but iOS is looking strong right now.

@ChillingSilence could you please also work to get @SmartArray added to the bounty #1 telegram channel. I'd like to chat in there about how we can perhaps share the bounty between both engineers as both contained fixes that helped get iOS going. Of course Android was a part of this bounty, so we'll need to discuss that as well.

ChillingSilence commented 6 years ago

Have sent an invite to @SmartArray to join the Telegram channel so we can discuss further.

ploenne commented 6 years ago

Update to integrate the latest digibytewallet-core version (GTO90/digibyte_update) in iOS:

core wallet updates :

https://github.com/Racecraft/digibytewallet-core/pull/9

and iOS updates:

https://github.com/Racecraft/digibytewallet-ios/pull/134

SmartArray commented 6 years ago

After spending several evenings to find the synchronization slowdown issue, I finally have found the issue by writing my own debugging tools. IntelliJ was not capable for debugging the app. I really hate that IDE.

Nevertheless, by fixing that code, my tests have shown a speedup of about 10x for iOS and 5x for Android, as the bug was in core code.. The synching process does not slow down anymore. That's what is the most important part about my bug fixes. These fixes unleashed a rocket in those apps.

Pull request: https://github.com/Racecraft/digibytewallet-core/pull/10

Technical view: Due to erroneous mathematical checks of the block height, the block list never got cleared and became gigantic over time. Unfortunately, the core code was iterating over that list, so that it slowed down after time, naturally!

This code must be checked against bugs, that could have been emitted by my fixes. First tests show steady and fast synching on my iPhone SE and my android phone. Unfortunately I had to remove the previous version of the app or clear the internal data, as the sync stalled in the beginning. This specific error never came back, on my side.

Please outline your experiences.

gto90 commented 6 years ago

@SmartArray ,

I checked out your branch and attempted to run the core with the iOS master branch and was not successful. I see the date swing from the first checkpoint all the way to 1960, then to 2080, and all kinds of dates until the sync fails and the progress view disappears.

While this happens, the console is filled with relayed orphan block logs. Not sure if that helps but can you check out the latest master branch for iOS, merge core's digibyte-updates branch with yours and try to run it again?