Open a26nine opened 2 months ago
Thank you for raising this concern, I'm sorry you are facing issues.
Have you tried any other node (default, pruned) from quicksync?
We can try and get in contact with quicksync and help them debug.
It is weird if this is happening frequently but we have had reports of apphashes tied to wasm
directories ever since we have introduced cosmwasm.
If possible, it would be beneficial to try and replicate this on a smaller node (to reduce debugging times). If this issue happens with other quicksync snapshots but not on polkachu or nodestake it could point to a slight misconfiguration in quicksync's export procedure that can be mitigated.
Thank you for raising this concern, I'm sorry you are facing issues.
Have you tried any other node (default, pruned) from quicksync?
We can try and get in contact with quicksync and help them debug.
It is weird if this is happening frequently but we have had reports of apphashes tied to
wasm
directories ever since we have introduced cosmwasm.If possible, it would be beneficial to try and replicate this on a smaller node (to reduce debugging times). If this issue happens with other quicksync snapshots but not on polkachu or nodestake it could point to a slight misconfiguration in quicksync's export procedure that can be mitigated.
I forgot to mention, we downloaded Polkachu's pruned snapshot, and it's running fine with the same binary without any issues.
@MSalopek, did you get a chance to check with the QuickSync team?
@MSalopek even i am getting similar issues
ChainLayer has been contacted. Updates will be posted as they reach us.
The issue seems to be solved on Quicksync's end.
Feel free to resync from the newest snapshot.
@mayank-daga @a26nine
The issue seems to be solved on Quicksync's end.
Feel free to resync from the newest snapshot.
No, it's not resolved. We are running pruned nodes for now.
I can confirm that it's not resloved yet. I've redownloaded archive for 2 of our RPC nodes after message that it got fixed on Quicksync's end, but if keeps on failing. We've run pruned node as a backup, but it tends to fail too after a while...
Sorry to hear that this is stil persisting.
We could provide instructions for a stop-gap solution that you could execute. The solution would require syncing an old gaia node instance and performing upgrades at designated block heights.
Unfortunately, we do not have other action we can perform here other than checking in with quicksync to help troubleshoot.
I will keep this issue open and close all other related issues.
I'll reach our to you if we would decide to follow stop-gap solution. Unfortunately, we're having problems with apphash no matter which snapshot we use. We still have one node that's running for a long time on default snapshot and it runs fine, but if we spin up new node with the same config and download new snapshot - it fails after a while. (Actually it's the same with archival).
Yesterday once again we downloaded latest archival snapshot, but it failed after a while
12:07AM INF finalized block block_app_hash=24F4B044B767AFD73F14A5DC1E930CD2E685A80B93347E1216C636E762BDCC75 height=22838854 module=state num_txs_res=4 num_val_updates=1 12:07AM INF executed block app_hash=24F4B044B767AFD73F14A5DC1E930CD2E685A80B93347E1216C636E762BDCC75 height=22838854 module=state 12:07AM INF updates to validators module=state updates=5A59DC8746FD727FDDD5CBF5CBB90C6F616CCF9B:3596564 12:07AM INF committed state block_app_hash=0261BEC7EC8EFFF3ABB850402C54B78A81D0B4ABAC9418D2DE3E2D495E09AEA6 height=22838854 module=state 12:07AM ERR Error in validation err="wrong Block.Header.AppHash. Expected 24F4B044B767AFD73F14A5DC1E930CD2E685A80B93347E1216C636E762BDCC75, got 05012E467D0657717BD073AE4A25E3F71B3C85BDF1C1FC8AE35B6AE9391CB372" module=blocksync 12:07AM ERR Stopping peer for error err="reactor validation error: wrong Block.Header.AppHash. Expected 24F4B044B767AFD73F14A5DC1E930CD2E685A80B93347E1216C636E762BDCC75, got 05012E467D0657717BD073AE4A25E3F71B3C85BDF1C1FC8AE35B6AE9391CB372" module=p2p peer="Peer{MConn{141.94.73.39:37656} 2bda8bff758a39916a528c6b70eefad9148d09ce out}" 12:07AM
@a26nine @MSalopek The issue was fact, that once wasm was introduced, it started to be pat of snapshot too. We were unpacking everything into /data directory, therefore we were losing "wasm" directory each time we spinned up node from snapshot. After fixing that, it's working without issues again.
In downloaded snapshot before wasm there was just one dir - data. Now there are two, data and wasm, so if somebody runs into that issue - please verify how you unpack the snapshot ;)
Is there an existing issue for this?
What happened?
Our
cosmoshub-4
archive nodes stopped progressing after thev19
upgrade. So, we downloaded the archive snapshot from QuickSync. The nodes progressed smoothly for a while, but then it AppHash'd. We waited for a few days and downloaded another snapshot from the same source, and the results were same again; the node AppHash'd after some time. Once more, we waited for a few days for a new snapshot, got it, and got AppHash'd again.The most recent AppHash happened on
v19.2.0
:I am not sure who/what is the culprit here—the snapshot, the binary, or something else?
We rolled back a few times and cleared the
wasm
directory before startinggaiad
. We also tried running with the pre-built binaries supplied in the Releases section. But, none of it helped.Our build process:
make install
(Current go version is1.22.6
)Long Version:
Gaia Version
v19.2.0
How to reproduce?
gaiad
binarygaiad
The node will AppHash after some time.