EvernodeXRPL / evernode-host

Evernode host installer
Other
54 stars 9 forks source link

Invoke reputation hook fails every second (alternate) hour #377

Closed jellicoe closed 5 months ago

jellicoe commented 6 months ago

Invoke Hook fails (tecHOOK_REJECTED) every second hour

01_reputation.log

jellicoe commented 6 months ago

was running

sashi attach -n NNNNNNNNN

and when invoke hook failed got below error

terminate called after throwing an instance of 'jsoncons::ser_error' what(): An unknown type was found in the stream at position 26 20240518 05:25:17.517 [wrn][hpc] Contract process (rdonly) ended prematurely. Exit code 0

then a bunch of these

[HPWS.C PID+00000013] GOTO client_closed, ERROR: client closed connection [HPWS.C PID+0000085B] Unable to connect, errno: 111 [HPWS.C PID+0000085B] ABEND: can't connect 20240518 05:27:05.028 [inf][hpc] Not enough peers proposing to perform consensus. votes:0 needed:1 [HPWS.C PID+0000085C] Unable to connect, errno: 111 [HPWS.C PID+0000085C] ABEND: can't connect [HPWS.C PID+0000085D] Unable to connect, errno: 111 [HPWS.C PID+0000085D] ABEND: can't connect 20240518 05:27:15.028 [inf][hpc] Not enough peers proposing to perform consensus. votes:0 needed:1 [HPWS.C PID+0000085E] Unable to connect, errno: 111 [HPWS.C PID+0000085E] ABEND: can't connect [HPWS.C PID+0000085F] Unable to connect, errno: 111 [HPWS.C PID+0000085F] ABEND: can't connect

and then instance closed

genesis-one-seven commented 6 months ago

I have the same issue on the 3 hosts on which I activated the reputation contract

jellicoe commented 6 months ago

go "evernode list" get the "name" id then go "sashi attach -n 05B43F699C957D8596DD5585313EA1E8E8B068F727C188973FCA74ECFC674F47"

wait for error when hook fails

genesis-one-seven commented 6 months ago

so far I only see this log from HotPocket every 3 seconds:

20240518 14:46:05.382 [inf][hpc] Not enough peers proposing to perform consensus. votes:19 needed:24

jellicoe commented 6 months ago

still happening even after applying patch today

jellicoe commented 6 months ago

every second hook invoke fails

check address on explorer

https://xahscan.com/account/rngvNAsNfk8d7pEtpDz3EwfQQD7Tb8FK7v

chalith commented 6 months ago

was running

sashi attach -n NNNNNNNNN

and when invoke hook failed got below error

terminate called after throwing an instance of 'jsoncons::ser_error' what(): An unknown type was found in the stream at position 26 20240518 05:25:17.517 [wrn][hpc] Contract process (rdonly) ended prematurely. Exit code 0

then a bunch of these

[HPWS.C PID+00000013] GOTO client_closed, ERROR: client closed connection [HPWS.C PID+0000085B] Unable to connect, errno: 111 [HPWS.C PID+0000085B] ABEND: can't connect 20240518 05:27:05.028 [inf][hpc] Not enough peers proposing to perform consensus. votes:0 needed:1 [HPWS.C PID+0000085C] Unable to connect, errno: 111 [HPWS.C PID+0000085C] ABEND: can't connect [HPWS.C PID+0000085D] Unable to connect, errno: 111 [HPWS.C PID+0000085D] ABEND: can't connect 20240518 05:27:15.028 [inf][hpc] Not enough peers proposing to perform consensus. votes:0 needed:1 [HPWS.C PID+0000085E] Unable to connect, errno: 111 [HPWS.C PID+0000085E] ABEND: can't connect [HPWS.C PID+0000085F] Unable to connect, errno: 111 [HPWS.C PID+0000085F] ABEND: can't connect

and then instance closed

This tends to happen when the host who initiates the cluster can't be connected. As seen in the logs [HPWS.C PID+0000085B] Unable to connect, errno: 111 [HPWS.C PID+0000085B] ABEND: can't connect But happening this in every round is suspicious because the initiator is random in every round. Is this the error you see in every round? And a side note, we are going to change the approach of initialing the cluster without waiting for an initiator all-start-at-once approach. Btw also it's better to check if any firewall rules affect when connecting to other peers. Can you send us the logs in an another round?

chalith commented 6 months ago

so far I only see this log from HotPocket every 3 seconds:

20240518 14:46:05.382 [inf][hpc] Not enough peers proposing to perform consensus. votes:19 needed:24

Seeing logs after 3 seconds is normal. This log means you've joined the cluster but some peers are not responding. Can you send us your reputation address too to see it's transactions

jellicoe commented 6 months ago

constantly getting

and also it's every second hour that fails, every other second hour succeeds see https://xahscan.com/account/rngvNAsNfk8d7pEtpDz3EwfQQD7Tb8FK7v

20240520 04:25:08.198 [inf][hpc] Not enough peers proposing to perform consensus. votes:0 needed:1 [HPWS.C PID+000000F1] Unable to connect, errno: 111 [HPWS.C PID+000000F1] ABEND: can't connect [HPWS.C PID+000000F2] Unable to connect, errno: 111 [HPWS.C PID+000000F2] ABEND: can't connect 20240520 04:25:18.198 [inf][hpc] Not enough peers proposing to perform consensus. votes:0 needed:1 [HPWS.C PID+000000F3] Unable to connect, errno: 111 [HPWS.C PID+000000F3] ABEND: can't connect [HPWS.C PID+000000F4] Unable to connect, errno: 111 [HPWS.C PID+000000F4] ABEND: can't connect [HPWS.C PID+000000F5] Unable to connect, errno: 111 [HPWS.C PID+000000F5] ABEND: can't connect

jellicoe commented 6 months ago

Firewall rules

root@evernode01:~# ufw status Status: active

To Action From


22/tcp ALLOW Anywhere 80/tcp ALLOW Anywhere # sashimono-certbot 26202/tcp ALLOW Anywhere # sashi-E71B9E28FCD095DCF02B8E234436D54585DFAC3C1EE429C9BABE2395ABD68513-user 22862 ALLOW Anywhere # sashi-E71B9E28FCD095DCF02B8E234436D54585DFAC3C1EE429C9BABE2395ABD68513-peer 22/tcp (v6) ALLOW Anywhere (v6) 80/tcp (v6) ALLOW Anywhere (v6) # sashimono-certbot 26202/tcp (v6) ALLOW Anywhere (v6) # sashi-E71B9E28FCD095DCF02B8E234436D54585DFAC3C1EE429C9BABE2395ABD68513-user 22862 (v6) ALLOW Anywhere (v6) # sashi-E71B9E28FCD095DCF02B8E234436D54585DFAC3C1EE429C9BABE2395ABD68513-peer

genesis-one-seven commented 6 months ago

so far I only see this log from HotPocket every 3 seconds: 20240518 14:46:05.382 [inf][hpc] Not enough peers proposing to perform consensus. votes:19 needed:24

Seeing logs after 3 seconds is normal. This log means you've joined the cluster but some peers are not responding. Can you send us your reputation address too to see it's transactions

Here's the three reputation addresses that are currently active :

rUthTB1THzGwhU7gTEkrias7pQKV691wAi rfiu9M9xazLbo1pgGnK5RGZeoR2maDUsfZ rsDr6aYXz8ij5Tr7n13GVgXMGuheaKUHKM

The alternate failure happens on all of them, as you can see.

jellicoe commented 6 months ago

now I'm getting these error s

20240520 22:57:02.251 [inf][hpc] We are not on the consensus ledger, we must request history from a peer. 20240520 22:57:02.253 [inf][hpc] Hpfs ldgr sync: Target added. Hash:8718687e /primary/0 20240520 22:57:04.752 [inf][hpc] We are not on the consensus ledger, we must request history from a peer. [HPWS.C PID+000003A8] Unable to connect, errno: 111 [HPWS.C PID+000003A8] ABEND: can't connect

jellicoe commented 6 months ago

this is the "sashi attach -n nnnnn" error I get at time hook invoke fails

this is happening on both my nodes i've upgrade to 0.8.3

20240521 00:25:22.889 [inf][hpc] Ledger created (lcl:351-9ab0d4f2 state:84ab9407 patch:f80a6de4) terminate called after throwing an instance of 'jsoncons::ser_error' what(): An unknown type was found in the stream at position 26 20240521 00:25:31.144 [wrn][hpc] Contract process (rdonly) ended prematurely. Exit code 0 20240521 00:25:32.889 [inf][hpc] Ledger created (lcl:352-b137aa61 state:84ab9407 patch:f80a6de4)

jellicoe commented 6 months ago

this is the failed / rejected hook TX on my 2nd upgraded node

https://xahscan.com/tx/DBFE79AEF72DE85EF138604C3583CDD6B69F1B9EBF6E94F284A251A92A3ACCF9

chalith commented 6 months ago

rsDr6aYXz8ij5Tr7n13GVgXMGuheaKUHKM

According to the transaction history, it seems all 3 hosts are failing every other round. Can you send us the hp.log file inside /host/sashixxxxx/<instance_name>/logs and the output of evernode reputation status

@jellicoe @genesis-one-seven Can you both send these?

genesis-one-seven commented 6 months ago

@jellicoe @genesis-one-seven Can you both send these?

I'll do it later, in the mean while I see the issue is currenlty less regular, for example here you can see three successful transactions in a row from 05/22/2024 06:27 to 05/22/2024 08:28:

https://xahscan.com/account/rUthTB1THzGwhU7gTEkrias7pQKV691wAi

genesis-one-seven commented 6 months ago

This is one example of reputation status: { "address": "rNoHVWcqx7WL1iH4D1t2xLHzCu426BcFTq", "lastRegisteredMoment": 3851, "scoreNumerator": 15, "scoreDenominator": 30, "moment": 3851, "orderedId": 255, "reportedHostCount": 348 } REPUTATIOND_SUCCESS

This is another reputation status { "address": "rUthTB1THzGwhU7gTEkrias7pQKV691wAi", "lastRegisteredMoment": 3851, "scoreNumerator": 1808, "scoreDenominator": 380, "moment": 3851, "orderedId": 338, "contract": { "domain": "2.genesis-one-seven-1.online", "pubkey": "ed74a092048bb90fc65ab2c7a6420fe13b1310a01fd51ffda5d54ffc2304e91fb2", "peerPort": 22862 }, "reportedHostCount": 348 } REPUTATIOND_SUCCESS

I'm sorry, I don't know how to find the folder /host/, can you please help me so I can send you the logs (I'm really new to linux...)

genesis-one-seven commented 6 months ago

Anyway, I checked some accounts from the hook account transaction list and it seems ALL of them has the problem, I couldn't find one account without errors:

https://xahscan.com/account/rsfTBRAbD2bYjVuXhJ2RReQXxR4K5birVW

jellicoe commented 6 months ago

seems to be affecting others too - so just wait for a fix

aYoDoughja commented 6 months ago

I am having the exact same issue as you jellicoe.

jellicoe commented 5 months ago

fixed in v0.9.0

jellicoe commented 5 months ago

fixed in v0.9.0