Agoric / agoric-sdk

monorepo for the Agoric Javascript smart contract platform
Apache License 2.0
327 stars 208 forks source link

intermittent failure in SwingSet/test/device-plugin/device.test.js #10142

Open warner opened 1 month ago

warner commented 1 month ago

I saw an intermittent test failure in packages/SwingSet/test/device-plugin/device.test.js that is worth tracking down the root cause of. It appeared in a CI run of PR #10033 , which didn't change any of the files involved (it just added some unused classification tools to misc-tools/), so it should not have been able to trigger a failure. I haven't seen this failure mode before, and running the test locally a few dozen times didn't reproduce it, so it's that worst kind of intermittent where it only happens on someone else's computer.

This is the job named test-swingset (20.x, 0, 5), and the step named yarn test (SwingSet), rather than the step named yarn test (SwingSet XS Worker). That means we'll be using a local (Node.js) worker, unless the test overrides that choice (and this one doesn't). I don't think that is likely to make a difference, because devices are always co-resident with the kernel, regardless of how vats are run.

Because the CI records will probably get deleted, here's a copy:

# device-plugin › device › initialize storage
# device-mailbox › device-mailbox › before hook
ok 55 - device-mailbox › device-mailbox › mailbox outbound %ava-dur=5879ms
ok 56 - device-mailbox › device-mailbox › mailbox inbound %ava-dur=7206ms
Connecting to pingpong.0 with epoch 0
ok 57 - device-plugin › device › plugin first time %ava-dur=25600ms
ok 58 - device-mailbox › device-mailbox › mailbox determinism %ava-dur=19177ms
anachrophobia strikes v1 on delivery 8
sc[0]: missing: {"0":"vatstoreGet","1":"vom.rc.o-50","length":2}
sc[1]: missing: {"0":"vatstoreGet","1":"vom.rc.o-51","length":2}
sc[2]: missing: {"0":"vatstoreGet","1":"vom.rc.o-52","length":2}
sc[3]: missing: {"0":"vatstoreGet","1":"vom.rc.o-53","length":2}
sc[4]: missing: {"0":"vatstoreGet","1":"vom.rc.o-54","length":2}
sc[5]: missing: {"0":"vatstoreGetNextKey","1":"vom.ir.o-50|","length":2}
sc[6]: missing: {"0":"vatstoreGetNextKey","1":"vom.ir.o-51|","length":2}
sc[7]: missing: {"0":"vatstoreGetNextKey","1":"vom.ir.o-52|","length":2}
sc[8]: missing: {"0":"vatstoreGetNextKey","1":"vom.ir.o-53|","length":2}
sc[9]: missing: {"0":"vatstoreGetNextKey","1":"vom.ir.o-54|","length":2}
sc[10]: missing: {"0":"dropImports","1":{"0":"o-50","1":"o-51","2":"o-52","3":"o-53","4":"o-54","length":5},"length":2}
sc[11]: missing: {"0":"retireImports","1":{"0":"o-50","1":"o-51","2":"o-52","3":"o-53","4":"o-54","length":5},"length":2}
not ok 59 - device-plugin › device › plugin after restart %ava-dur=7603ms
#   REJECTED from ava test.serial("plugin after restart"): (Error#1)
#   Error#1: anachrophobia in v1: missing syscalls
#       at Object.finishSimulation (.../swingset-vat/src/kernel/vat-warehouse.js:167:13)
#       at replayTranscript (.../swingset-vat/src/kernel/vat-warehouse.js:319:5)
#       at async ensureVatOnline (.../swingset-vat/src/kernel/vat-warehouse.js:374:1)
#       at async Object.start (.../swingset-vat/src/kernel/vat-warehouse.js:421:1)
#       at async Object.start (.../swingset-vat/src/kernel/kernel.js:1723:1)
#       at async makeSwingsetController (file:///home/runner/work/agoric-sdk/agoric-sdk/packages/SwingSet/src/controller/controller.js:264:3)
#       at async setupVatController (file:///home/runner/work/agoric-sdk/agoric-sdk/packages/SwingSet/test/device-plugin/device.test.js:62:13)
#       at async file:///home/runner/work/agoric-sdk/agoric-sdk/packages/SwingSet/test/device-plugin/device.test.js:100:5

  ---
    name: AssertionError
    message: Rejected promise returned by test
    values:
      'Rejected promise returned by test. Reason:': |-
        Error {
          message: 'anachrophobia in v1: missing syscalls',
        }
    at: |-
      Object.finishSimulation (.../swingset-vat/src/kernel/vat-warehouse.js:167:13)
      replayTranscript (.../swingset-vat/src/kernel/vat-warehouse.js:319:5)
      async ensureVatOnline (.../swingset-vat/src/kernel/vat-warehouse.js:374:1)
      async Object.start (.../swingset-vat/src/kernel/vat-warehouse.js:[421](https://github.com/Agoric/agoric-sdk/actions/runs/11016192455/job/30591288771?pr=10033#step:6:422):1)
      async Object.start (.../swingset-vat/src/kernel/kernel.js:1723:1)
      async makeSwingsetController (file://src/controller/controller.js:264:3)
      async setupVatController (file://test/device-plugin/device.test.js:62:13)
      async file://test/device-plugin/device.test.js:100:5
  ...

1..59
# tests 59
# pass 58
# fail 1
warner commented 1 month ago

Note that this test file has two parts, which must be run together, and serially: the first sets up the conditions for the second. There's a comment that points out this is probably a bad idea, and that the test should be rewritten somehow.

The two parts share a RAM-based swingstore DB (using initSwingStore() without an argument), via a test.before() hook. That is easy enough to move into a single test, tearing down the first kernel and then building a second around the same DB.