classic-terra / core

GO implementation of the Terra Classic Protocol
Other
65 stars 39 forks source link

[BUG] Chain Halt #158

Closed faddat closed 1 year ago

faddat commented 1 year ago

Describe the bug Terra classic halted on this block:

To Reproduce Unknown

Context & versions Latest release

(if applicable) suggested solution This issue should be used to centralize information about the halt. Useful information includes:

Should speak to validators in order of their votepower. In order to bing the chain back up, you'll need 2/3rd of votepower online. So, start with the highest ranked validator, and check the version of software that they are running. If a whitelisted account was present in the halt block, but different versions were running simultaneously, this would explain the halt.

If a VaaS provider with >1/3rd of votepower was running a different version of the software, this could also be responsible for the chain halt.

I am concerned with the timing of this PR:

@inon-man may very well be correct: if binaries had already been released to validators, then it is possible that some were running a version of lunc that did not include #149. I don't know all of the other changes in the upgrade, and am looking into it now.

Additional information

These services, as well as the security incident reporting concerning allnodes.com, are funded by delegations to the Notional validator on LUNC.

atomlab commented 1 year ago

My node has upgraded to v1.1.0 but it's crashing. I faced an error.

INF Starting IndexerService service impl=IndexerService module=txindex
INF ABCI Handshake App Info hash="ϳ�\x03x�G��Fk�hzi`R��\x13|D�V\x1c\x05\x1c{��O�" height=11734001 module=consensus protocol-version=0 software-version=1.1.0
INF ABCI Replay Blocks appHeight=11734001 module=consensus stateHeight=11734001 storeHeight=11734002
INF Replay last block using real app module=consensus

Error: error during handshake: error on replay: wrong Block.Header.LastResultsHash.  Expected 658EC57B453249685F1074BC1F6CE5C56C04730BD850F0F05DFAAD41BF02B3B1, got AFE97BD368F0D6B92206B83EB89A1CBC8DED4B9EE1596F4A819C6817F53F47C9
# terrad version
1.1.0
faddat commented 1 year ago

This issue identified the root cause.

Then l1tf fixed it.

Proof:

Screenshot_20230301-211123

faddat commented 1 year ago

@atomlab I bet that you're still running the previous tag. You see what they did, it was highly irresponsible, they changed the contents of the tag. So you see your local get repository it thinks that it knows what the tag is but it's not true nuke your git repository check out the tag install the chain and then start your daemon and it should work

You could also check out the git commit hash that corresponds to the new tag.

Since you're a validator most likely you should be asking very difficult questions at this time, such as:

atomlab commented 1 year ago

I have compiled a terrad from this hash 70d118b.

Checking out the ref
  /usr/bin/git checkout --progress --force refs/tags/v1.1.0
  Previous HEAD position was 8bb56e9 Merge pull request #44 from classic-terra/v1.0.5-vm-fix
  HEAD is now at 70d118b add 3 binance addresses (#149)
/usr/bin/git log -1 --format='%H'
'70d118b0ab38c5c2b61288a090177fdfa33dfe76'
faddat commented 1 year ago

Yeah there's kind of like an issue there hold on a second please:

You should use the commands

git checkout 70d118b
go install ./...
terrad start

If you're still having that problem you might want to check the version of Go Lang on your computer

If both of those don't work then I would strongly suggest that you do like:

cd ~/
rm -rf core
git clone https://GitHub.com/classic-terra/core
cd core
go install ./...
terrad start
bobbyd666 commented 1 year ago

Same here (from 70d118b). Go ver. downgraded. Any tips guys? panic: Failed to process committed block (11734002:2567013EB5E4ED5D538672B668B57A276F30129952572150C4FEE00F62E9E727): wrong Block.Header.LastResultsHash. Expected 658EC57B453249685F1074BC1F6CE5C56C04730BD850F0F05DFAAD41BF02B3B1, got AFE97BD368F0D6B92206B83EB89A1CBC8DED4B9EE1596F4A819C6817F53F47C9

terrad version --long name: terra server_name: terrad version: 1.1.0 commit: 70d118b0ab38c5c2b61288a090177fdfa33dfe76 build_tags: netgo,ledger go: go version go1.18.1 linux/amd64

erzqk commented 1 year ago

Hello. Im trying run node from pruning snapshot and got same error as @atomlab : Error: error during handshake: error on replay: wrong Block.Header.LastResultsHash. Expected 658EC57B453249685F1074BC1F6CE5C56C04730BD850F0F05DFAAD41BF02B3B1, got AFE97BD368F0D6B92206B83EB89A1CBC8DED4B9EE1596F4A819C6817F53F47C9. I;+'m also trying run node without snapshot with syncing blockhain and have panic: Must use v1.0.x for importing the columbus genesis (https://github.com/classic-terra/core/releases/). What I can do with that?

bobbyd666 commented 1 year ago

Same discussion here https://classic-agora.terra.money/t/v1-1-0-software-upgrade-proposal/50242/21

There's an interesting tip from aeuser999. Can anyone confirm the libwasmvm.so version please (ldd terrad)?

bobbyd666 commented 1 year ago

Daaaamn.

7:51PM INF indexed block height=11734248 module=txindex 7:51PM INF indexed block height=11734249 module=txindex 7:51PM INF indexed block height=11734250 module=txindex

This lib seems to be the problem. It also needs to be updated to: libwasmvm.so => ~/go/pkg/mod/github.com/!cosm!wasm/wasmvm@v0.16.7/api/libwasmvm.so (0x00007f5fc0e74000)

Credit goes to aeuser999

aeuser999 commented 1 year ago

In truth, the person who pointed out that during the upgrade was LordInateur (so all credit to them :) ).

I am glad it helped out though, and that you figured out the issue (and I will keep that in mind to pass along if others have a similar issue too).

faddat commented 1 year ago

Okay so one of the things that we can get from this, that is clearly positive, is a really really clear description of the causes. From what you are saying, it sounds like there are at minimum two things that caused this problem:

1) overwriting the tag for v1.1.0 2) version of libwasmvm.so

Great hunting, @aeuser999!

. Do you happen to know the percentages here? For example, do we have a clear picture yet of what percentage of nodes having difficulty had difficulty with item number one, versus percentage of nodes having difficulty with item too?

aeuser999 commented 1 year ago

Hi @faddat,

Really the main issue, from my limited participation, was that it took a while before we hit voting power consensus. During the upgrade though there was some really great team work around one or more of these issues from everyone involved (it was a privilege to witness and experience and I am thankful - it was an enjoyable and great experience):

If there is anything in there that is helpful from a code analysis perspective, from my very limited purview, then I hope that is helpful.

Thank you too for the contributions of your expertise and to code you have contributed, and Notional's continued contributions as a validator and offering public endpoints via Notional's infrastructure, to the Terra v1 community.

I hope you have a great day today :)