Agoric / testnet-notes

notes for and by incentivized testnet participants
https://validate.agoric.com/
Creative Commons Zero v1.0 Universal
2 stars 1 forks source link

repeated jailing, intermittent block signing failures #11

Closed dckc closed 3 years ago

dckc commented 3 years ago

Lots of validators are under 100% uptime:

Screenshot from 2021-06-26 21-51-21

see esp. discussion starting at https://discord.com/channels/585576150827532298/819073555446759444/858531108777754625

and Redlly https://testnet.explorer.agoric.net/validator/agoricvaloper1tfc68r0324p76a708f3hh029qzwhwz8yp36dnz

low peer counts

9:56 PM] Redlly: It seems like the peer count is low as well - only 8-9    Is that normal ?
[9:56 PM] dckc | Agoric: that does seem low
dckc commented 3 years ago

long block times; repeated validator

more oddness:

27 Jun 2021, 2:56:43am UTC
2E114A15AED9A420C7BB7958F3E57303D19168CC1FF0C0D8853E77E1A8CD4E10
TuretskiyVTuretskiyV
2
20,091
27 Jun 2021, 2:56:32am UTC
C8B3C2B1CCD5BCB1A900CFDA26F0A9DD9748AD23D8BD66380FC327D9CD95402F
TuretskiyVTuretskiyV
1
20,090

Screenshot at 2021-06-26 21-57-36

27 Jun 2021, 3:25:18am UTC
9D562F44429A8BBC1CC3E9E32A908BEE8B489663EAD5216DDCB92B486814DB5F
onchain
0
20,361
27 Jun 2021, 3:25:07am UTC
6F818632959C9D8A55B035D8DA8186B328A1103638A8D50017ECC43A6018C298
onchain
2
20,360
jjangg96 commented 3 years ago

I'm having a similar situation.

image
aditya-manit commented 3 years ago

+1 to repeated jailing (╯°□°)╯︵ ┻━┻

Also I tried a few things like deleting the address book to fetch the new peers on restart. On restarted its getting stuck here:

-- Logs begin at Tue 2021-04-27 13:17:05 UTC. --
Jun 27 08:38:15 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[2098902]: 2021-06-27T08:38:15.865Z SwingSet: kernel: SwingSet:vatWorker Making fake price authority for RUN-simolean
Jun 27 08:38:15 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[2098902]: 2021-06-27T08:38:15.865Z SwingSet: kernel: SwingSet:vatWorker Making fake price authority for simolean-RUN
Jun 27 08:38:20 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[2098902]: 2021-06-27T08:38:20.895Z SwingSet: kernel: SwingSet:vatWorker IBC downcall bindPort { packet: { source_port: 'echo' } }
Jun 27 08:38:20 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[2098902]: 2021-06-27T08:38:20.905Z SwingSet: kernel: SwingSet:vatWorker IBC downcall bindPort { packet: { source_port: 'transfer' } }
Jun 27 08:38:20 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[2098902]: 2021-06-27T08:38:20.913Z SwingSet: kernel: SwingSet:vatWorker IBC downcall bindPort { packet: { source_port: 'port-1' } }
Jun 27 08:38:20 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[2098902]: 2021-06-27T08:38:20.918Z SwingSet: kernel: SwingSet:vatWorker IBC downcall bindPort { packet: { source_port: 'port-2' } }
Jun 27 08:38:20 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[2098902]: 2021-06-27T08:38:20.922Z SwingSet: kernel: SwingSet:vatWorker IBC downcall bindPort { packet: { source_port: 'port-3' } }
Jun 27 08:38:20 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[2098902]: 2021-06-27T08:38:20.926Z SwingSet: kernel: SwingSet:vatWorker IBC downcall bindPort { packet: { source_port: 'port-4' } }
Jun 27 08:38:20 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[2098902]: 2021-06-27T08:38:20.931Z SwingSet: kernel: SwingSet:vatWorker IBC downcall bindPort { packet: { source_port: 'port-5' } }
Jun 27 08:38:20 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[2098902]: 2021-06-27T08:38:20.935Z SwingSet: kernel: SwingSet:vatWorker IBC downcall bindPort { packet: { source_port: 'port-6' } }

There are no logs after it. The only solution is to do a reset-all or else it will be stucked here forever. Even after doing a reset-all I got jailed again after a few minutes later when I tried unjailing myself (after getting catched up ofc)

aditya-manit commented 3 years ago

I see someone else has reported the same issue here: https://discord.com/channels/585576150827532298/819073555446759444/858602315454873600

aditya-manit commented 3 years ago

Digged dipper into my logs

Jun 23 06:40:09 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697036]: 2021-06-23T06:40:09.592Z block-manager: block 640465 begin
Jun 23 06:40:09 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697036]: 2021-06-23T06:40:09.631Z block-manager: block 640465 commit
Jun 23 06:40:15 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697036]: 2021-06-23T06:40:15.447Z block-manager: block 640466 begin
Jun 23 06:40:15 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697036]: 2021-06-23T06:40:15.466Z block-manager: block 640466 commit
Jun 23 06:40:21 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697036]: 2021-06-23T06:40:21.752Z block-manager: block 640467 begin
Jun 23 06:40:21 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697036]: 2021-06-23T06:40:21.927Z SwingSet: kernel: SwingSet:vatWorker IBC downcall bindPort { packet: { source_port: 'port-376' } }
Jun 23 06:40:21 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697036]: 2021-06-23T06:40:21.987Z SwingSet: kernel: SwingSet:vatWorker IBC downcall bindPort { packet: { source_port: 'port-377' } }
Jun 23 06:40:22 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697036]: 2021-06-23T06:40:22.072Z SwingSet: kernel: SwingSet:vatWorker IBC downcall bindPort { packet: { source_port: 'port-378' } }
Jun 23 06:40:22 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697109]: Logging sent error stack (RemoteError#574)
Jun 23 06:40:22 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697109]: RemoteError#574: vat terminated
Jun 23 06:40:22 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697109]: Error: vat terminated
Jun 23 06:40:22 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697109]:  at construct (0:0)
Jun 23 06:40:22 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697109]:  at Error (0:0)
Jun 23 06:40:22 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697109]:  at makeError (0:0)
Jun 23 06:40:22 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697109]:  at fullRevive (0:0)
Jun 23 06:40:22 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697109]:  at unserialize (0:0)
Jun 23 06:40:22 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697109]:  at notifyOnePromise (0:0)
Jun 23 06:40:22 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697109]:  at notify (0:0)
Jun 23 06:40:22 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697109]:  at dispatchToUserspace (0:0)
Jun 23 06:40:22 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697109]:  at (0:0)
Jun 23 06:40:22 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697109]: RemoteError#574 ERROR_NOTE: Rejection from: (Error#575) : 3948 . 0
Jun 23 06:40:22 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697109]: RemoteError#574 ERROR_NOTE: Rejection from: (Error#576) : 3940 . 1
Jun 23 06:40:22 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697109]: RemoteError#574 ERROR_NOTE: Sent as error:liveSlots:v12#70082
Jun 23 06:40:22 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697109]: Error#575: Event: 3947.1
Jun 23 06:40:22 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697109]: Error: Event: 3947.1
Jun 23 06:40:22 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697109]:  at construct (0:0)
Jun 23 06:40:22 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697109]:  at Error (0:0)
Jun 23 06:40:22 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697109]:  at trackTurns (0:0)
Jun 23 06:40:22 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697109]:  at handle (0:0)
Jun 23 06:40:22 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697109]:  at (0:0)
Jun 23 06:40:22 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697109]:  at createUserBundle (0:0)
Jun 23 06:40:22 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697109]:  at (0:0)
Jun 23 06:40:22 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697109]:  at (0:669462000)
Jun 23 06:40:22 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697109]: Error#576: Event: 3939.1
Jun 23 06:40:22 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697109]: Error: Event: 3939.1
Jun 23 06:40:22 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697109]:  at construct (0:0)
Jun 23 06:40:22 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697109]:  at Error (0:0)
Jun 23 06:40:22 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697109]:  at trackTurns (0:0)
Jun 23 06:40:22 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697109]:  at handle (0:0)
Jun 23 06:40:22 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697109]:  at deliver (0:0)
Jun 23 06:40:22 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697109]:  at dispatchToUserspace (0:0)
Jun 23 06:40:22 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697109]:  at (0:0)
Jun 23 06:40:22 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697036]: 2021-06-23T06:40:22.172Z block-manager: block 640467 commit
Jun 23 06:40:27 ubuntu-s-4vcpu-8gb-intel-blr1-01-agoric ag-chain-cosmos[697036]: 2021-06-23T06:40:27.066Z block-manager: block 640468 begin

I am getting these error in between block commit and maybe therefore keep getting jailed Hoping that could be useful for team to debug further 🥇

asifhj commented 3 years ago

My validator is also stuck at the same point. Tried to restart for a second time but no luck.

Jun 27 22:23:08 e2e-79-184 ag-chain-cosmos[927747]: 2021-06-27T16:53:08.297Z SwingSet: kernel: SwingSet:vatWorker Successful update {
Jun 27 22:23:08 e2e-79-184 ag-chain-cosmos[927747]:   address: 'agoric1ergm6shrhfes05fk8uaxfp8rghjgnrrykc6m5t',
Jun 27 22:23:08 e2e-79-184 ag-chain-cosmos[927747]:   amount: '144062885',
Jun 27 22:23:08 e2e-79-184 ag-chain-cosmos[927747]:   denom: 'urun'
Jun 27 22:23:08 e2e-79-184 ag-chain-cosmos[927747]: }
Jun 27 22:23:15 e2e-79-184 ag-chain-cosmos[927747]: 2021-06-27T16:53:15.483Z SwingSet: kernel: SwingSet:vatWorker Making fake price authority for RUN-moola
Jun 27 22:23:15 e2e-79-184 ag-chain-cosmos[927747]: 2021-06-27T16:53:15.484Z SwingSet: kernel: SwingSet:vatWorker Making fake price authority for moola-RUN
Jun 27 22:23:15 e2e-79-184 ag-chain-cosmos[927747]: 2021-06-27T16:53:15.491Z SwingSet: kernel: SwingSet:vatWorker Making fake price authority for RUN-simolean
Jun 27 22:23:15 e2e-79-184 ag-chain-cosmos[927747]: 2021-06-27T16:53:15.491Z SwingSet: kernel: SwingSet:vatWorker Making fake price authority for simolean-RUN
Jun 27 22:23:26 e2e-79-184 ag-chain-cosmos[927747]: 2021-06-27T16:53:26.174Z SwingSet: kernel: SwingSet:vatWorker IBC downcall bindPort { packet: { source_port: 'echo' } }
Jun 27 22:23:26 e2e-79-184 ag-chain-cosmos[927747]: 2021-06-27T16:53:26.185Z SwingSet: kernel: SwingSet:vatWorker IBC downcall bindPort { packet: { source_port: 'transfer' } }
Jun 27 22:23:26 e2e-79-184 ag-chain-cosmos[927747]: 2021-06-27T16:53:26.194Z SwingSet: kernel: SwingSet:vatWorker IBC downcall bindPort { packet: { source_port: 'port-1' } }
Jun 27 22:23:26 e2e-79-184 ag-chain-cosmos[927747]: 2021-06-27T16:53:26.201Z SwingSet: kernel: SwingSet:vatWorker IBC downcall bindPort { packet: { source_port: 'port-2' } }
Jun 27 22:23:26 e2e-79-184 ag-chain-cosmos[927747]: 2021-06-27T16:53:26.209Z SwingSet: kernel: SwingSet:vatWorker IBC downcall bindPort { packet: { source_port: 'port-3' } }
Jun 27 22:23:26 e2e-79-184 ag-chain-cosmos[927747]: 2021-06-27T16:53:26.217Z SwingSet: kernel: SwingSet:vatWorker IBC downcall bindPort { packet: { source_port: 'port-4' } }
Jun 27 22:23:26 e2e-79-184 ag-chain-cosmos[927747]: 2021-06-27T16:53:26.226Z SwingSet: kernel: SwingSet:vatWorker IBC downcall bindPort { packet: { source_port: 'port-5' } }
Jun 27 22:23:26 e2e-79-184 ag-chain-cosmos[927747]: 2021-06-27T16:53:26.236Z SwingSet: kernel: SwingSet:vatWorker IBC downcall bindPort { packet: { source_port: 'port-6' } }
Jun 27 22:23:26 e2e-79-184 ag-chain-cosmos[927747]: 2021-06-27T16:53:26.246Z SwingSet: kernel: SwingSet:vatWorker IBC downcall bindPort { packet: { source_port: 'port-7' } }
Jun 27 22:23:26 e2e-79-184 ag-chain-cosmos[927747]: 2021-06-27T16:53:26.254Z SwingSet: kernel: SwingSet:vatWorker IBC downcall bindPort { packet: { source_port: 'port-8' } }
Jun 27 22:23:26 e2e-79-184 ag-chain-cosmos[927747]: 2021-06-27T16:53:26.259Z SwingSet: kernel: SwingSet:vatWorker IBC downcall bindPort { packet: { source_port: 'port-9' } }
dckc commented 3 years ago

We have a possible diagnosis and we're working on a fix.

dckc commented 3 years ago

This pattern of 0 -> 50, 50 -> 0 every few minutes looks odd.

neonodes - Cosmos Validator _ The Big Dipper.pdf

dckc commented 3 years ago

@warner @michaelfig I'm not sure how to approach this problem report. Do you see any next steps on this issue? Any that are cost-effective for this week's devnet milestone?

If not this milestone, is the Someday Pile OK? (otherwise I'll pick some other scheduled milestone arbitrarily)

If we don't see any next steps, I suppose we should declare a lost cause and close it as wontfix or invalid.

dckc commented 3 years ago

@michaelfig and i are content to close this.

Anybody with more information, please report it here.