Closed gusin13 closed 2 months ago
the select should have been something like this with a max limit on retries
const maxRetries = 10 // Set a maximum number of retries
var retryCount int
for {
select {
case <-time.After(fp.cfg.SubmissionRetryInterval):
// periodically query the index block to be later checked whether it is Finalized
finalized, err := fp.consumerCon.QueryIsBlockFinalized(targetBlockHeight)
if err != nil {
return nil, fmt.Errorf("failed to query block finalization at height %v: %w", targetBlockHeight, err)
}
if finalized {
fp.logger.Debug(
"the block is already finalized, skip submission",
zap.String("pk", fp.GetBtcPkHex()),
zap.Uint64("target_height", targetBlockHeight),
)
return nil, nil
}
retryCount++
if retryCount >= maxRetries {
return nil, fmt.Errorf("reached max retries without finalizing block at height %v", targetBlockHeight)
}
case <-fp.quit:
fp.logger.Debug("the finality-provider instance is closing", zap.String("pk", fp.GetBtcPkHex()))
return nil, nil
}
}
This issue already fixed in https://github.com/babylonchain/finality-provider/pull/373
@gitferry i was checking this fn in above pr retryCheckRandomnessUntilBlockFinalized
https://github.com/babylonchain/finality-provider/pull/373/files#diff-e5232c32aa3d31ac0a8060c218be2e8f279a344c502be784ba90275b26789614R472
if hasRand is false and finalized is also false, it would be stuck in select loop, we need a way to exit from select loop if its not finalized which is missing in code.
if hasRand is false and finalized is also false, it would be stuck in select loop, we need a way to exit from select loop if its not finalized which is missing in code.
@gusin13 why? If hasRand
is false, then numRetries++
. The loop will exist when numRetries > uint32(fp.cfg.MaxSubmissionRetries)
@gitferry so the select {} block always gets triggered if !hasRand and the problem is once it enters the select {} block there is no way to exit if its not finalized.
Here is some pseudo code to explain potential bug. Assume fp.cfg.SubmissionRetryInterval is 2 minutes
function retryCheckRandomnessUntilBlockFinalized(targetBlock):
retries = 0
while True: // this depicts the outer for{} loop
if (randomnessExists(targetBlock)):
return false, nil // Randomness is good, exit successfully
else:
retries++
if (retries > maxRetries):
return false, "Error: Reached max retries"
// if randomness does'nt exist it will always reach this point of execution i.e select {} block
wait for 2 minutes or quit signal: // this is the select {} block
if (blockIsFinalized(targetBlock)):
return true, nil // Block finalized, exit
else if (quit signal received):
return false, "Error: Finality check interrupted"
// (Missing step: If the block isn't finalized, it should exit the select {} block and re-enter the for {} block so the retry
logic works properly but this is missing )
The select block is in the same for loop with the check of fp.hasRandomness
. If the two cases of the select not met (no block finalized or no quit signal received), it will go to the next round. Why does it not exit?
@SebastianElvis Could you take a look as you are the reviewer of https://github.com/babylonchain/finality-provider/pull/373?
If the two cases of the select not met (no block finalized or no quit signal received), it will go to the next round
it would go to the next round of the select
block not the for
loop which is where my confusion is, basically execution would get stuck in select
block until one of the cases is satisfied and in case the block is never finalized then the select
block will run forever, maybe i am wrong or overthinking 😄
it would go to the next round of the select block not the for loop which is where my confusion is, basically execution would get stuck in select block until one of the cases is satisfied and in case the block is never finalized then the loop will run forever, maybe i am wrong or overthinking 😄
I don't think so. Select block is one-time execution unless you put extra for loop around it. See https://go.dev/tour/concurrency/5
you are absolutely right! my bad 🙈 sorry for the confusion again i am closing this now. tysm!
potential bug -
https://github.com/babylonchain/finality-provider/blob/f3158290261d03c61acc805b3496cada33a3c5fb/finality-provider/service/fp_instance.go#L531-L547
if the block is not finalized the select statement would run forever blocking the sig/pub randomness submission
this will be exposed if there is no pub rand for a specific block and the block is also not finalized