Agoric / agoric-sdk

monorepo for the Agoric Javascript smart contract platform
Apache License 2.0
326 stars 206 forks source link

comms vat killed by unexpected recv seqNum #3021

Open warner opened 3 years ago

warner commented 3 years ago

Describe the bug

While testing, I saw the comms vat get terminated when it received an unexpected seqnum. The slog trace was:

{"time":1620070469.4858224,"type":"deliver","crankNum":65210,"vatID":"v12","deliveryNum":14612,"kd":["message","ko464",{"method":"receive","args":{"body":"[\"8158:6811:deliver:ro+73:makeIn
vitation:rp-8197;[]\"]","slots":[]},"result":"kp24984"}],"vd":["message","o+30",{"method":"receive","args":{"body":"[\"8158:6811:deliver:ro+73:makeInvitation:rp-8197;[]\"]","slots":[]},"re
sult":"p-8218"}]}
{"time":1620070469.4860346,"type":"syscall","crankNum":65210,"vatID":"v12","deliveryNum":14613,"syscallNum":0,"ksc":["vatstoreGet","v12","r.o+30"],"vsc":["vatstoreGet","r.o+30"]}
{"time":1620070469.486178,"type":"syscall-result","crankNum":65210,"vatID":"v12","deliveryNum":14613,"syscallNum":0,"ksr":["ok","r1"],"vsr":["ok","r1"]}
{"time":1620070469.4863627,"type":"clist","crankNum":65210,"mode":"drop","vatID":"v12","kobj":"kp24984","vobj":"p-8218"}
{"time":1620070469.486449,"type":"syscall","crankNum":65210,"vatID":"v12","deliveryNum":14613,"syscallNum":1,"ksc":["resolve","v12",[["kp24984",false,{"body":"{\"@qclass\":\"undefined\"}",
"slots":[]}]]],"vsc":["resolve",[["p-8218",false,{"body":"{\"@qclass\":\"undefined\"}","slots":[]}]]]}
{"time":1620070469.487023,"type":"syscall-result","crankNum":65210,"vatID":"v12","deliveryNum":14613,"syscallNum":1,"ksr":["ok",null],"vsr":["ok",null]}
{"time":1620070469.4871848,"type":"syscall","crankNum":65210,"vatID":"v12","deliveryNum":14613,"syscallNum":2,"ksc":["vatstoreGet","v12","r1.initialized"],"vsc":["vatstoreGet","r1.initiali
zed"]}
{"time":1620070469.4873176,"type":"syscall-result","crankNum":65210,"vatID":"v12","deliveryNum":14613,"syscallNum":2,"ksr":["ok","true"],"vsr":["ok","true"]}
{"time":1620070469.4877985,"type":"syscall","crankNum":65210,"vatID":"v12","deliveryNum":14613,"syscallNum":3,"ksc":["vatstoreGet","v12","r1.recvSeq"],"vsc":["vatstoreGet","r1.recvSeq"]}
{"time":1620070469.488014,"type":"syscall-result","crankNum":65210,"vatID":"v12","deliveryNum":14613,"syscallNum":3,"ksr":["ok","8156"],"vsr":["ok","8156"]}
{"time":1620070469.4881413,"type":"syscall","crankNum":65210,"vatID":"v12","deliveryNum":14613,"syscallNum":4,"ksc":["vatstoreSet","v12","r1.recvSeq","8157"],"vsc":["vatstoreSet","r1.recvS
eq","8157"]}
{"time":1620070469.4882417,"type":"syscall-result","crankNum":65210,"vatID":"v12","deliveryNum":14613,"syscallNum":4,"ksr":["ok",null],"vsr":["ok",null]}
{"time":1620070469.489727,"type":"deliver-result","crankNum":65210,"vatID":"v12","deliveryNum":14612,"dr":["error","unexpected recv seqNum (a string)",null]}
{"time":1620070470.2255008,"type":"terminate","vatID":"v12","shouldReject":true,"info":{"body":"{\"@qclass\":\"error\",\"name\":\"Error\",\"message\":\"unexpected recv seqNum (a string)\"}","slots":[]}}

I'm pretty sure it's happening here: https://github.com/Agoric/agoric-sdk/blob/8b704bbb0be119c1acb35f92b3b6df3eeef43780/packages/SwingSet/src/vats/comms/delivery.js#L139-L143

I'm not exactly sure why this caused the comms vat to terminate, but I think it's because we decided that any unhandled error from dispatch qualifies as an internal error and kills the vat.

The tasks are:

warner commented 3 years ago

We put a bandage on this in #3024, but we need to make a thorough check of all the assertions and make sure they kill only the appropriate level.

dckc commented 3 years ago

Is there more to do on this for today's release? If not, I'd like to postpone it to a later milestone.

warner commented 3 years ago

I don't think there's more to do on this for the Staking Dynamics milestone. We'll just have to see if the #3024 bandage makes a difference.

Tartuffo commented 2 years ago

We may have fixed the initial error, but what should we do if it DOES happen again?