Electric-Coin-Company / zcash-android-wallet-sdk

Native Android SDK for Zcash
MIT License
5 stars 10 forks source link

Repeated downloading phase #785

Closed HonzaR closed 1 year ago

HonzaR commented 2 years ago

Describe the issue

It seems that we suffer with a new problem in the Download phase of syncing with latest blocks.

Can you reliably reproduce the issue?

If so, please list the steps to reproduce below:

  1. Run the demo-app on the latest commit of the Main branch
  2. Prepare a new seed phrase and input it (update: or use the predefined one)
  3. Open the Get Balance screen and wait for the Download phase finish
  4. You can see it's repeating the Download phase infinitely

Expected behaviour

Download phase done -> Sync phase done -> Synced

Actual behavior + errors

@TWIG 11/17/22 12:05:45.398000000[BaseContinuationImpl:33](#downloading)    downloaded BlockHeight(value=2116743)..BlockHeight(value=2116752) (batch 675 of 676) [BlockHeight(value=2116743)..BlockHeight(value=2116752)] | 197ms
@TWIG 11/17/22 12:05:45.400000000[BaseContinuationImpl:33](#downloading)    downloaded 10 blocks!
@TWIG 11/17/22 12:05:45.582000000[BaseContinuationImpl:33](#downloading)    downloaded BlockHeight(value=2116753)..BlockHeight(value=2116761) (batch 676 of 676) [BlockHeight(value=2116753)..BlockHeight(value=2116761)] | 172ms
@TWIG 11/17/22 12:05:45.582000000[BaseContinuationImpl:33](#downloading)    downloaded 9 blocks!
@TWIG 11/17/22 12:05:45.588000000[CompactBlockProcessor:78](#validating)    validating blocks in range BlockHeight(value=280001)..BlockHeight(value=2116761) in db: /data/user/0/cash.z.ecc.android.sdk.demoapp.testnet/no_backup/co.electricoin.zcash/zcash_sdk_testnet_cache.sqlite3
app_time_stats: avg=1677.75ms min=1677.75ms max=1677.75ms count=1
@TWIG 11/17/22 12:05:45.916000000[CompactBlockProcessor:78](#)    validation failed at block BlockHeight(value=282921) which had hash null but the expected hash was null
@TWIG 11/17/22 12:05:45.922000000[CompactBlockProcessor:78](#)    The next block block: BlockHeight(value=282922) which had hash null but the expected hash was null
@TWIG 11/17/22 12:05:45.924000000[CompactBlockProcessor:78](#)    =================== BLOCKS [BlockHeight(value=282921)..282931]: START ========
@TWIG 11/17/22 12:05:45.931000000[CompactBlockProcessor:78](#)    block: BlockHeight(value=282921)  hash=null   prevHash=null
@TWIG 11/17/22 12:05:45.940000000[CompactBlockProcessor:78](#)    block: BlockHeight(value=282922)  hash=null   prevHash=null
@TWIG 11/17/22 12:05:45.947000000[CompactBlockProcessor:78](#)    block: BlockHeight(value=282923)  hash=null   prevHash=null
@TWIG 11/17/22 12:05:45.950000000[CompactBlockProcessor:78](#)    block: BlockHeight(value=282924)  hash=null   prevHash=null
@TWIG 11/17/22 12:05:45.953000000[CompactBlockProcessor:78](#)    block: BlockHeight(value=282925)  hash=null   prevHash=null
@TWIG 11/17/22 12:05:45.956000000[CompactBlockProcessor:78](#)    block: BlockHeight(value=282926)  hash=null   prevHash=null
@TWIG 11/17/22 12:05:45.961000000[CompactBlockProcessor:78](#)    block: BlockHeight(value=282927)  hash=null   prevHash=null
@TWIG 11/17/22 12:05:45.965000000[CompactBlockProcessor:78](#)    block: BlockHeight(value=282928)  hash=null   prevHash=null
@TWIG 11/17/22 12:05:45.970000000[CompactBlockProcessor:78](#)    block: BlockHeight(value=282929)  hash=null   prevHash=null
@TWIG 11/17/22 12:05:45.975000000[CompactBlockProcessor:78](#)    block: BlockHeight(value=282930)  hash=null   prevHash=null
@TWIG 11/17/22 12:05:45.979000000[CompactBlockProcessor:78](#)    block: BlockHeight(value=282931)  hash=null   prevHash=null
@TWIG 11/17/22 12:05:45.979000000[CompactBlockProcessor:78](#)    =================== BLOCKS [BlockHeight(value=282921)..282931]: END ========
@TWIG 11/17/22 12:05:45.980000000[CompactBlockProcessor:767](#)    offset = min(100, 10 * (5)) = 50
@TWIG 11/17/22 12:05:45.980000000[CompactBlockProcessor:767](#)    lowerBound = max(BlockHeight(value=282921) - 50, BlockHeight(value=2109900)) = BlockHeight(value=2109900)
@TWIG 11/17/22 12:05:45.981000000[CompactBlockProcessor:78](#)    handling chain error at BlockHeight(value=282921) by rewinding to block BlockHeight(value=2109900)
@TWIG 11/17/22 12:05:45.983000000[SdkSynchronizer:102](#)    Chain error detected at height: BlockHeight(value=282921). Rewinding to: BlockHeight(value=2109900)
@TWIG 11/17/22 12:05:45.983000000[SdkSynchronizer:102](#)    WARNING: a chain error occurred but no callback is registered to be notified of chain errors. To respond to these errors (perhaps to update the UI or alert the user) set synchronizer.onChainErrorHandler to a non-null value
@TWIG 11/17/22 12:05:45.987000000[CompactBlockProcessor:8](#)    Rewinding from BlockHeight(value=280000) to requested height: BlockHeight(value=2109900) using target height: BlockHeight(value=2110002) with last local block: BlockHeight(value=280000)
@TWIG 11/17/22 12:05:45.987000000[CompactBlockProcessor:8](#)    not rewinding dataDb because the last scanned height is BlockHeight(value=280000) and the last local block is BlockHeight(value=280000) both of which are less than the target height of BlockHeight(value=2110002)
@TWIG 11/17/22 12:05:45.988000000[CompactBlockProcessor:8](#)    Also clearing block cache back to BlockHeight(value=2110002). These rewound blocks will download in the next scheduled scan
@TWIG 11/17/22 12:05:46.214000000[CompactBlockProcessor:8](#downloading)    found 6760 missing blocks, downloading in 676 batches of 10...
@TWIG 11/17/22 12:05:46.385000000[BaseContinuationImpl:33](#downloading)    downloaded BlockHeight(value=2110003)..BlockHeight(value=2110012) (batch 1 of 676) [BlockHeight(value=2110003)..BlockHeight(value=2110012)] | 170ms
@TWIG 11/17/22 12:05:46.386000000[BaseContinuationImpl:33](#downloading)    downloaded 10 blocks!
@TWIG 11/17/22 12:05:46.537000000[BaseContinuationImpl:33](#downloading)    downloaded BlockHeight(value=2110013)..BlockHeight(value=2110022) (batch 2 of 676) [BlockHeight(value=2110013)..BlockHeight(value=2110022)] | 148ms
@TWIG 11/17/22 12:05:46.537000000[BaseContinuationImpl:33](#downloading)    downloaded 10 blocks!
HonzaR commented 1 year ago

The trouble seems to be more in the validation phase than in the downloading phase. I re-run and debug the whole process multiple times and here are some notes on it.

@ccjernigan @str4d please, let me know your thoughts on this. Thank you.

ccjernigan commented 1 year ago

Pacu's input is that a chain reorg is probably happening, the rewind logic is broken, and somehow we're going back to the Sapling activation height of 280000. That 280000 is the key thing he's noticing from the logs.

HonzaR commented 1 year ago

Here is another variant of the failed validation log:

[BaseContinuationImpl:33](#downloading)    downloaded BlockHeight(value=2125673)..BlockHeight(value=2125682) (batch 1568 of 1569) [BlockHeight(value=2125673)..BlockHeight(value=2125682)] | 135ms                                                                                                        
[BaseContinuationImpl:33](#downloading)    downloaded 10 blocks!                                                                                                                                                                                                                                          
[BaseContinuationImpl:33](#downloading)    downloaded BlockHeight(value=2125683)..BlockHeight(value=2125688) (batch 1569 of 1569) [BlockHeight(value=2125683)..BlockHeight(value=2125688)] | 146ms                                                                                                        
[BaseContinuationImpl:33](#downloading)    downloaded 6 blocks!                                                                                                                                                                                                                                           
[CompactBlockProcessor:78](#validating)    validating blocks in range BlockHeight(value=280001)..BlockHeight(value=2125688) in db: /data/user/0/cash.z.ecc.android.sdk.demoapp.testnet/no_backup/co.electricoin.zcash/zcash_sdk_testnet_cache.sqlite3                                                                                                                                                                                                                                                                                                                    
[CompactBlockDownloader:128](#)    Shutting down channel                                                                                                                                                                                                                                                  
[CompactBlockProcessor:78](#)    ERROR: unable to resolve reorg at height Error(failedAtHeight=BlockHeight(value=284881)) after 5 correction attempts!                                                                                                                                                    
[SdkSynchronizer:102](#)    ERROR while processing data: cash.z.ecc.android.sdk.exception.CompactBlockProcessorException$FailedReorgRepair: ERROR: unable to resolve reorg at height Error(failedAtHeight=BlockHeight(value=284881)) after 5 correction attempts!                                         
[SdkSynchronizer:102](#)    WARNING: falling back to the default behavior for processor errors. To add custom behavior, set synchronizer.onProcessorErrorHandler to a non-null value                                                                                                                      
[BaseContinuationImpl:33](#)    Failed due to cash.z.ecc.android.sdk.exception.CompactBlockProcessorException$FailedReorgRepair: ERROR: unable to resolve reorg at height Error(failedAtHeight=BlockHeight(value=284881)) after 5 correction attempts! caused by null backing off and retrying in 435ms...
[BaseContinuationImpl:33](#)    cash.z.ecc.android.sdk.exception.CompactBlockProcessorException$FailedReorgRepair: ERROR: unable to resolve reorg at height Error(failedAtHeight=BlockHeight(value=284881)) after 5 correction attempts!                                                                  
[BaseContinuationImpl:33](#)        at cash.z.ecc.android.sdk.block.CompactBlockProcessor$start$2.invokeSuspend(CompactBlockProcessor.kt:244)                                                                                                                                                              
[BaseContinuationImpl:33](#)        at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)                                                                                                                                                                              
[BaseContinuationImpl:33](#)        at kotlinx.coroutines.UndispatchedCoroutine.afterResume(CoroutineContext.kt:233)                                                                                                                                                                                       
[BaseContinuationImpl:33](#)        at kotlinx.coroutines.AbstractCoroutine.resumeWith(AbstractCoroutine.kt:102)                                                                                                                                                                                           
[BaseContinuationImpl:33](#)        at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:46)                                                                                                                                                                              
[BaseContinuationImpl:33](#)        at kotlinx.coroutines.UndispatchedCoroutine.afterResume(CoroutineContext.kt:233)                                                                                                                                                                                       
[BaseContinuationImpl:33](#)        at kotlinx.coroutines.AbstractCoroutine.resumeWith(AbstractCoroutine.kt:102)                                                                                                                                                                                           
[BaseContinuationImpl:33](#)        at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:46)                                                                                                                                                                              
[BaseContinuationImpl:33](#)        at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)                                                                                                                                                                                                        
[BaseContinuationImpl:33](#)        at kotlinx.coroutines.internal.LimitedDispatcher.run(LimitedDispatcher.kt:42)                                                                                                                                                                                          
[BaseContinuationImpl:33](#)        at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:95)                                                                                                                                                                                                             
[BaseContinuationImpl:33](#)        at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:570)                                                                                                                                                                               
[BaseContinuationImpl:33](#)        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:750)                                                                                                                                                                      
[BaseContinuationImpl:33](#)        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:677)                                                                                                                                                                        
[BaseContinuationImpl:33](#)        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:664)                                                                                                                                                                              
[BaseContinuationImpl:33](#)                                                                                                                                                                                                                                                                              
[CompactBlockProcessor:8](#)    Warning: failed to update ranges due to io.grpc.StatusRuntimeException: UNAVAILABLE: Channel shutdown invoked caused by null                                                                                                                                              
[CompactBlockProcessor:8](#)    Disconnection detected! Attempting to reconnect!                                                                                                                                                                                                                          
[CompactBlockProcessor:285](#)    closing existing channel and then reconnecting                                                                                                                                                                                                                          
[LightWalletGrpcService:117](#)    Creating channel that will connect to lightwalletd.testnet.electriccoin.co:9067/?usePlaintext=false                                                                                                                                                                    
[BaseContinuationImpl:33](#)    Unable to process new blocks because we are disconnected! Attempting to reconnect in 275ms                                                                                                                                                                                
HonzaR commented 1 year ago

Waiting for https://github.com/zcash/zcash-android-wallet-sdk/issues/615, which can solve this trouble.

HonzaR commented 1 year ago

Closing as non-reproducible after our latest networking refactoring tasks are done.