VolumeFi / paloma

The fast blockchain messenger protocol
Apache License 2.0
3 stars 2 forks source link

[devops - pigeon] mainnet validator: chain info mismatch #1634

Closed verabehr closed 4 months ago

verabehr commented 4 months ago

We're getting the following errors on pigeon on the mainnet validator:

May 29 17:32:10 mainnet-validator pigeon[591345]: {"error":"chain info mismatch in: 'amount of chains', want '3', got '8'","level":"warning","msg":"Chain infos changed. Building processors...","time":"2024-05-29T17:32:10Z","x-correlation-id":"cpbmd5l8c5kscvf8votg"}
May 29 17:32:10 mainnet-validator pigeon[591345]: {"chain-reference-id":"bnb-main","error":"BlockByHash: not found","level":"error","msg":"incorrect chain","time":"2024-05-29T17:32:10Z","x-correlation-id":"cpbmd5l8c5kscvf8vot0"}

This started coincidently? pretty much exactly when I failed over to the quicknode endpoint, by adding a line in the .pigeon/env.sh file with the quicknode endpoint and adding a # in front of the line with the liquify endpoint. pigeon service was restarted, no other files or configs were touched. Also tried to undo my changes, but the error persists so I added them back in.

Here is a fuller error log that includes both errors

May 29 18:31:28 mainnet-validator pigeon[593074]: {"alive-until-bh":17913005,"btl":1124,"component":"Heart.Beat","current-bh":17911881,"level":"debug","msg":"checking keep alive","should-send-keep-alive":false,"time":"2024-05-29T18:31:28Z","x-correlation-id":"cpbn9058c5ksufgshh7g"}
May 29 18:31:28 mainnet-validator pigeon[593074]: {"component":"procmon","level":"debug","msg":"Process executed","process":"Keep alive","time":"2024-05-29T18:31:28Z","x-correlation-id":"cpbn9058c5ksufgshh7g"}
May 29 18:31:29 mainnet-validator pigeon[593074]: {"chain-reference-id":"bnb-main","error":"BlockByHash: not found","level":"error","msg":"incorrect chain","time":"2024-05-29T18:31:29Z","x-correlation-id":"cpbn8od8c5ksufgshgu0"}
May 29 18:31:29 mainnet-validator pigeon[593074]: {"chain-reference-id":"bnb-main","level":"debug","msg":"Releasing mutex.","time":"2024-05-29T18:31:29Z","x-correlation-id":"cpbn8od8c5ksufgshgu0"}
May 29 18:31:29 mainnet-validator pigeon[593074]: {"error":"chain info mismatch in: 'amount of chains', want '3', got '8'","level":"warning","msg":"Chain infos changed. Building processors...","time":"2024-05-29T18:31:29Z","x-correlation-id":"cpbn8vd8c5ksufgshh6g"}
May 29 18:31:29 mainnet-validator pigeon[593074]: {"level":"debug","msg":"Acquiring mutex...","time":"2024-05-29T18:31:29Z","x-correlation-id":"cpbn8vd8c5ksufgshh6g"}
May 29 18:31:29 mainnet-validator pigeon[593074]: {"level":"debug","msg":"Mutex acquired.","time":"2024-05-29T18:31:29Z","x-correlation-id":"cpbn8pd8c5ksufgshgvg"}
May 29 18:31:29 mainnet-validator pigeon[593074]: {"component":"procmon","error":"BlockByHash: not found","level":"error","msg":"Failed to execute process: BlockByHash: not found","process":"[Gravity] Relay batches","time":"2024-05-29T18:31:29Z","x-correlation-id":"cpbn8od8c5ksufgshgu0"}
May 29 18:31:29 mainnet-validator pigeon[593074]: {"level":"info","msg":"relayer loop","time":"2024-05-29T18:31:29Z","x-correlation-id":"cpbn90d8c5ksufgshh80"}
May 29 18:31:31 mainnet-validator pigeon[593074]: {"chain-reference-id":"base-main","error":"BlockByHash: 500 Internal Server Error: {\"message\":\"Something went wrong the server closed connection before returning the first response byte. Make sure the server returns 'Connection: close' response header before closing the connection\",\"status\":500,\"error\":{}}","level":"error","msg":"incorrect chain","time":"2024-05-29T18:31:31Z","x-correlation-id":"cpbn8pd8c5ksufgshgvg"}
May 29 18:31:31 mainnet-validator pigeon[593074]: {"chain-reference-id":"base-main","level":"debug","msg":"Releasing mutex.","time":"2024-05-29T18:31:31Z","x-correlation-id":"cpbn8pd8c5ksufgshgvg"}
May 29 18:31:31 mainnet-validator pigeon[593074]: {"error":"chain info mismatch in: 'amount of chains', want '1', got '8'","level":"warning","msg":"Chain infos changed. Building processors...","time":"2024-05-29T18:31:31Z","x-correlation-id":"cpbn90d8c5ksufgshh80"}
byte-bandit commented 4 months ago

This error is caused by pruning from RPC providers. As the chain progresses, the original block height we chose as a reference is getting older and older. We encountered this issue on test net as well, it looks like at least Liquify has now pruned the block height used for BNB.

I see three ways to improve this atm:

verabehr commented 4 months ago

thanks @byte-bandit when does this block-hash check happen? Only at certain times, e.g. whenever pigeon is (re)started?

byte-bandit commented 4 months ago

Every time Pigeon needs to update its chain client configuration, that is usually

verabehr commented 4 months ago

Got it, thanks. I'm wondering how many other rpc providers prune. I'm thinking this could become an issue during the next chain upgrade in case folks restart their pigeon - cc @taariq

For now, I'll bring back our validator with a fallback RPC

byte-bandit commented 4 months ago

Agreed. I'm thinking we should go down the road of this automated height/hash maintenance. I think it wouldn't take longer than 1-2 days to implement.

verabehr commented 4 months ago

yes, that makes the most sense to me as well. Will open up a new ticket for that