babylonchain / finality-provider

A peripheral program run by the finality providers
Other
16 stars 27 forks source link

retry loop runs forever if block is not finalized #515

Closed gusin13 closed 2 months ago

gusin13 commented 2 months ago

potential bug -

https://github.com/babylonchain/finality-provider/blob/f3158290261d03c61acc805b3496cada33a3c5fb/finality-provider/service/fp_instance.go#L531-L547

if the block is not finalized the select statement would run forever blocking the sig/pub randomness submission

this will be exposed if there is no pub rand for a specific block and the block is also not finalized

gusin13 commented 2 months ago

the select should have been something like this with a max limit on retries

    const maxRetries = 10 // Set a maximum number of retries
    var retryCount int

    for {
        select {
        case <-time.After(fp.cfg.SubmissionRetryInterval):
            // periodically query the index block to be later checked whether it is Finalized
            finalized, err := fp.consumerCon.QueryIsBlockFinalized(targetBlockHeight)
            if err != nil {
                return nil, fmt.Errorf("failed to query block finalization at height %v: %w", targetBlockHeight, err)
            }
            if finalized {
                fp.logger.Debug(
                    "the block is already finalized, skip submission",
                    zap.String("pk", fp.GetBtcPkHex()),
                    zap.Uint64("target_height", targetBlockHeight),
                )
                return nil, nil
            }

            retryCount++
            if retryCount >= maxRetries {
                return nil, fmt.Errorf("reached max retries without finalizing block at height %v", targetBlockHeight)
            }

        case <-fp.quit:
            fp.logger.Debug("the finality-provider instance is closing", zap.String("pk", fp.GetBtcPkHex()))
            return nil, nil
        }
    }
gitferry commented 2 months ago

This issue already fixed in https://github.com/babylonchain/finality-provider/pull/373

gusin13 commented 2 months ago

@gitferry i was checking this fn in above pr retryCheckRandomnessUntilBlockFinalized https://github.com/babylonchain/finality-provider/pull/373/files#diff-e5232c32aa3d31ac0a8060c218be2e8f279a344c502be784ba90275b26789614R472

if hasRand is false and finalized is also false, it would be stuck in select loop, we need a way to exit from select loop if its not finalized which is missing in code.

gitferry commented 2 months ago

if hasRand is false and finalized is also false, it would be stuck in select loop, we need a way to exit from select loop if its not finalized which is missing in code.

@gusin13 why? If hasRand is false, then numRetries++. The loop will exist when numRetries > uint32(fp.cfg.MaxSubmissionRetries)

gusin13 commented 2 months ago

@gitferry so the select {} block always gets triggered if !hasRand and the problem is once it enters the select {} block there is no way to exit if its not finalized.

Here is some pseudo code to explain potential bug. Assume fp.cfg.SubmissionRetryInterval is 2 minutes

function retryCheckRandomnessUntilBlockFinalized(targetBlock):
    retries = 0

    while True: // this depicts the outer for{} loop
        if (randomnessExists(targetBlock)):
            return false, nil  // Randomness is good, exit successfully
        else:
            retries++
            if (retries > maxRetries):
                return false, "Error: Reached max retries" 

        // if randomness does'nt exist it will always reach this point of execution i.e select {} block
        wait for 2 minutes or quit signal: // this is the select {} block
            if (blockIsFinalized(targetBlock)):
                return true, nil  // Block finalized, exit
            else if (quit signal received):
                return false, "Error: Finality check interrupted"

           // (Missing step: If the block isn't finalized, it should exit the select {} block and re-enter the for {} block so the retry 
          logic works properly but this is missing )
gitferry commented 2 months ago

The select block is in the same for loop with the check of fp.hasRandomness. If the two cases of the select not met (no block finalized or no quit signal received), it will go to the next round. Why does it not exit?

@SebastianElvis Could you take a look as you are the reviewer of https://github.com/babylonchain/finality-provider/pull/373?

gusin13 commented 2 months ago

If the two cases of the select not met (no block finalized or no quit signal received), it will go to the next round

it would go to the next round of the select block not the for loop which is where my confusion is, basically execution would get stuck in select block until one of the cases is satisfied and in case the block is never finalized then the select block will run forever, maybe i am wrong or overthinking 😄

gitferry commented 2 months ago

it would go to the next round of the select block not the for loop which is where my confusion is, basically execution would get stuck in select block until one of the cases is satisfied and in case the block is never finalized then the loop will run forever, maybe i am wrong or overthinking 😄

I don't think so. Select block is one-time execution unless you put extra for loop around it. See https://go.dev/tour/concurrency/5

gusin13 commented 2 months ago

you are absolutely right! my bad 🙈 sorry for the confusion again i am closing this now. tysm!