Closed gto90 closed 6 years ago
Issue fixed on iOS: https://github.com/Racecraft/digibytewallet-ios/pull/129/commits/c03518db418c747d98ab98ed8ec12b3f82f6b9c6
I spent 8 hours (!!) figuring out what was going on. It was a deadlock in swift-code.
Set PEER_MAX_CONNECTIONS to 1 (in file BRPeerManager.h). Synching works smoothly. As soon as PEER_MAX_CONNECTIONS is set to >= 2
obj_sync_enter()
becomes a point-of-no-return after a peer disconnected by accident. Since we only read from blockRefs, we can safely neglect synching the access. For those who wonder: SQLITE3 is set to multithreading-mode in this project. It handles concurrency.
Refer to https://stackoverflow.com/a/35131246 why it is happening.
dispatchsync*()
should not be called directly, even if you see it in several projects. Refer to swift guidelines.
https://straypixels.net/swift-dictionary-locking/ also gives an in-depth analysis, why it is critical to use
dispatchsync*()
especially because each dispatched Swift-thread has its own blockRefs-object. They all have different memory addresses, IMHO.
let blockRefs = [BRBlockRef?](UnsafeBufferPointer(start: blocks, count: blocksCount))
dispatchsync*()
is just useless in that case, if I am correct.
I tested this solution a couple of times and synching finishes flawlessly. My android phone does not stall using the android app.
D9FTckJZ2wKHC3BvVxDccguRTy9UwLVZDw
Thank you very much for that wonderful project.
Good evening @SmartArray ,
My most humble thanks for all of your hard work on this. Indeed it appears you have resolved a deadlock issue, but I am still able to reproduce the stalling sync. Can you try this for me:
Menu->Settings->Sync Blockchain
They sync progress view is presented with a starting date of Feb 20, 2018 at 3:37 PM and does not progress any further. In some cases, you can force quit the app and retry the rescan successfully. However, in most cases rescan will stall. Finally, force quitting and unsuccessfully running the rescan function can at times get the application into an unrecoverable state where no further syncing with the blockchain will succeed.
Selecting the rescan function should immediately begin rescanning the blockchain from the checkpoint closest to the wallet creation date and the sync progress view should updated normally.
This issue manifests itself on Android if you were to restore using:
loan patient inquiry alter define genuine hobby chicken cat praise general trade
You'll find initially that you can restore roughly 6hours per-second with this rapidly decreasing to around 1 hour per-second by the time it reaches December 15th 2017, and from there on out it slows until it eventually stalls completely.
Are you able to reproduce the issue on Android using the above restore code?
Liking the fix thus far for Android. If you would like to do some testing with us in realtime, please feel free to jump on Telegram: https://t.me/DigiByteDevelopers
@ChillingSilence Thanks for giving me an insight on this. I am going to debug the android application later. I've an idea what this is about.
First I am going to investigate on the iOS issue. Right at the moment, I can see the same (similar) behavior in the iOS app. I have decent 6 minutes per second after Dec, 2017.
Hi everybody:
issue fixed here:
https://github.com/Racecraft/digibytewallet-ios/pull/130
and includes a fix/update on the digibytewallet-core:
github.com/Racecraft/digibytewallet-core/pull/8
i think the issue is a thread lock issue, while multiple peers sending the blocks. Due different network threads in can be happen two incoming 'inv' message accesing the database at the same time and led to not unlock the function.
So making persisting the incoming block thread save fix the issue.
Commented code and and not reseted display variable ('lastBlockHeight') causing ReScan issues.
Both issue fixed with this PR's.
The incoming message 'got inv with 1 item(s)' is just an anoucement of a node about a new Block ... For every node a message will display and anoucement a new block, so this is no bug.
Improved logging:
introducing a better logging with:
log_u256_hex_decode() instead of u256_hex_decode()
On Xcode / Mac / Litte Endian machines you now can simple copy the hash in the console and search for them in the block explorer. Easy to check whats the current block/transaction/blockheight is.
Debugging:
I wrote some tests and let them run often. I will clean up them and commite them as well.
I think we could improve the speed, to often multiple peers are requested for the same headers/block.
In case of approval:
DLgq9e1w4VURYvgcfHJTj8fS73h4NTwGbA
@SmartArray and @ploenne , Between the two of you it seems like sync is so far not stalling at all for me on iOS. We will need to push out a TestFlight build to put it through the paces so if both of you could PM me in telegram an email I can use for TestFlight I'd like to get a test build in your hands.
Also, we'll still need to work to see what Android needs to also resolve it's sync issue but iOS is looking strong right now.
@ChillingSilence could you please also work to get @SmartArray added to the bounty #1 telegram channel. I'd like to chat in there about how we can perhaps share the bounty between both engineers as both contained fixes that helped get iOS going. Of course Android was a part of this bounty, so we'll need to discuss that as well.
Have sent an invite to @SmartArray to join the Telegram channel so we can discuss further.
Update to integrate the latest digibytewallet-core version (GTO90/digibyte_update) in iOS:
core wallet updates :
https://github.com/Racecraft/digibytewallet-core/pull/9
and iOS updates:
After spending several evenings to find the synchronization slowdown issue, I finally have found the issue by writing my own debugging tools. IntelliJ was not capable for debugging the app. I really hate that IDE.
Nevertheless, by fixing that code, my tests have shown a speedup of about 10x for iOS and 5x for Android, as the bug was in core code.. The synching process does not slow down anymore. That's what is the most important part about my bug fixes. These fixes unleashed a rocket in those apps.
Pull request: https://github.com/Racecraft/digibytewallet-core/pull/10
Technical view: Due to erroneous mathematical checks of the block height, the block list never got cleared and became gigantic over time. Unfortunately, the core code was iterating over that list, so that it slowed down after time, naturally!
This code must be checked against bugs, that could have been emitted by my fixes. First tests show steady and fast synching on my iPhone SE and my android phone. Unfortunately I had to remove the previous version of the app or clear the internal data, as the sync stalled in the beginning. This specific error never came back, on my side.
Please outline your experiences.
@SmartArray ,
I checked out your branch and attempted to run the core with the iOS master branch and was not successful. I see the date swing from the first checkpoint all the way to 1960, then to 2080, and all kinds of dates until the sync fails and the progress view disappears.
While this happens, the console is filled with relayed orphan block
logs. Not sure if that helps but can you check out the latest master branch for iOS, merge core's digibyte-updates branch with yours and try to run it again?
Impact
Stalled blockchain synchronization prevents the sync progress indicator from progressing and new blocks and transactions from being added to the local wallet.
Components
The primary affected component is the digiwallet-core SPV library. Secondary affected components are the digibytewallet-android and digibytewallet-ios mobile clients.
Details
Generally, the blockchain synchronization behavior of the iOS or Android wallets is unremarkable. However, periodically the sync progress view will cease to make further progress causing the application to fail to retrieve new blocks and transactions. This incorrect, periodic behavior usually manifests itself after the initial wallet sync has completed, but not in all cases. It usually stalls after several days of not having opened the wallet or rescanning the blockchain from the application menu.
Reproduction Environments
There are two environments in which you can reproduce this issue:
iOS
In order to be able to reproduce this issue via the digibytewallet-ios client, you will need a suitable Apple computer capable of running the latest version of the Xcode IDE. Once you have your Xcode environment set up, you need to clone the following these steps:
pod install
command.breadwallet.xcworkspace
Android
In order to be able to reproduce this issue via the digibytewallet-android client, you will need a computer with the latest version of Google’s Android Studio installed and configured. Once you have your Android Studio environment set up, you need to clone the following these steps:
Steps to Reproduce
To reproduce the issue please attempt all of the following:
Expected Behavior
Whenever a synchronization with the blockchain is happening, the user will be presented with a sync progress indicator view. This view should show steady and consistent progress with the date on the view increasing and the progress bar filling up.
Observed Behavior
Periodically, the sync progress indicator will stop updating and remain stalled on a given date. When this happens, the user's only recourse is to force quit the application and restart where the sync progress view will usually continue without issue. In some rare cases, sync will get into a state that requires the user to delete the application and reinstall forcing a fresh sync from either the checkpoint closest to the wallet creation date or the earliestKeyTime.
When the sync process stalls, one of two patterns can be observed in the respective environment's console logs. These patterns will loop indefinitely until the user force quits the application. No other method is known to recover once this application state is entered:
iOS
or
Bounty Award
The solution accepted by the DigiByte Core development team will be rewarded with 100,000 DGB.
Bounty Payout Process