Open warner opened 3 years ago
We believe the cosmic-swingset side of this is complete: any rejections from c.run()
will cause the chain to halt. The remaining study should look at the swingset side to make sure any delivery problems cause a kernel halt and rejection of the c.run promise.
This overlaps with my current work on making sure device invocations cannot crash the kernel (made more visible by the introduction of "raw devices" as part of #1346).
@FUDCo did you take a pass at this? @warner seems to recall you might have.
We checked the surface level in kernel.run/step
and tryProcessMessage
, and we like what we see. We'd like to do a deeper audit later, more of a purple-team exercise, but we don't have the time to do it now. So I'm going to defer the rest of this ticket until after MN-1.
What is the Problem Being Solved?
While thinking about #3073 and what happens when a validator disk runs out of space, we decided we should really audit the full kernel delivery pathway to make sure that any non-vat-behavior-triggered failure (e.g. out of disk, unexpected process termination, host application coding error, etc) causes a kernel panic and brings the host application to a halt, rather than committing some portion of state that is not coherent with the cosmos-sdk/tendermint committed state.
For example, if the kernel-side syscall handler attempts to write data to a HostStorage blobstore and runs out of disk space, that will throw an exception and cause the vat process to see an error. We must ensure that it also sets the right "kernel is panicing" flag, so that as soon as the kernel gets control back from the child process, the kernel will throw its own exception. Ultimately the requirement is that
c.step()
/c.run()
rejects its return promise, and that the host application (cosmic-swingset) reacts to this by halting the process, rather than committing anything to any database.I believe there are basically three things to pay attention to:
c.step()
/c.run()
and delivery to vat, plus everything from the delivery being complete toc.step()
returningc.step()
/c.run()
, with particular attention to what happens when it rejects the result promise