Compiler validation: provide an option for PGo to attempt compiling invariants into generated code

fhackett commented 3 years ago

It would be much easier to argue PGo's soundness, if generated programs could check themselves.

This issue suggests adding a flag to PGo, such that it attempts to extract safety properties from the TLA+, and compiles assertions of them into generated code. Temporal properties will likely not be practical to handle.

Challenges:

scoping of assertions vs. archetypes
more deeply, archetype semantics vs. pcal state semantics, and communication issues

fhackett commented 3 years ago

Some thoughts, after mulling over the noted challenges.

It seems that attempting to work backwards from plain TLA+ invariants, written in terms of the TLA+ output, may be an exercise in frustration.

By design, the variables involved in the TLA+ output may be significantly different from those available to the generated code... which means, if we want to go down this road, that we have another variation of the abstraction problem inherent to the problem PGo tries to deal with.

One way to address this issue might be to allow archetype-level invariants to be specified in MPCal, in terms of the archetype resources. This might allow an implementation to reasonably execute the invariants, but, in turn, would make it highly unclear what the TLA+ model should do... worse, the invariants might need to perform reads on the archetype resources (because that is the only way to access such things, at runtime), meaning that the assertions would destructively affect system state. This is also without beginning to ask what the TLA+ translation of such invariants might look like.

Clearly, the "obvious" solution has some issues, from a practical viewpoint. At a high level, the biggest roadblock might just be that, unlike the TLA+ spec, the implementation is not likely to have all the information it needs to ever check most interesting assertions (at least, in one centralised location).

All that being said, maybe the answer lies in ... well, how do you check distributed assertions at runtime? Distributed tracing + verification seems to be an answer, here. In that case, the challenge becomes:

we still need to somehow correspond model state with some window into program state... it's unlikely that there will be a general solution to this, given the ad-hoc nature of all the encodings (in TLA+, in the resource implementations). Maybe we can require resource implementations to have a "verifier mode" where they describe their state in terms of the TLA+ model?
we need to track progress of state / the overall algorithm across a distributed system, as verification conditions apply to some global system snapshot

Both of these things would not be unreasonable to do e.g using our distributed tracing lib, or something like it: https://github.com/DistributedClocks/tracing. The challenge would then be to map the outputs back to something that the verification conditions can "understand". This would require (1) obviously, generating something one could feed to a model checker, and (2) some care when writing those conditions. For instance, one might want to avoid depending strongly on overly conservative model checking assumptions (e.g some exact configuration of nodes, or a statically chosen set or overly restrictive model of input/output values), which would prevent the verification conditions from being applied to the more diverse set of scenarios that might occur during runtime.

So, this challenge seems to be more or less a runtime problem. It might be possible to somehow assist developers in writing appropriately abstract verification conditions, but, more or less, this is a case of needing to record system behaviour, format it for e.g TLC, and check that the properties are satisfied.

lemmy commented 3 years ago

@fhackett Just in case you're not aware of work that appears related:

By the way, have you considered targeting other programming languages than Go (e.g. https://microsoft.github.io/coyote/) that might be more amenable to verification?

fhackett commented 3 years ago

@lemmy thanks for the links. I wasn't aware of these, and they were interesting to check out.

Links to material referenced in YT material:

Model-based trace checking: https://arxiv.org/abs/1111.2825
TLA+ community modules (a well-known thing, but now you can click it): https://github.com/tlaplus/CommunityModules
TLA+ conformance monitor: https://github.com/onosproject/tlaplus-monitor

Some notes, based on my understanding:

the "eXtreme Modelling in practice" paper mentions they had trouble with the trace checking I'm suggesting here. Certainly, if one does not happen to be generating a large amount of the implementation, these issues would not be surprising. Will take their observations as caution, though I have faint hopes that they might apply only partially to PGo.
Generating testing scenarios from the models (without requiring specific tracing) was not something I initially considered. Good to think about, as an option.
re: the verification gap, seems a major issue is determining order. Not sure yet how we'd deal with this, though the tracer linked above does support vector clocks, and that might actually work out.

re: using other programming languages, given a current rewrite (in progress), it should be a lot simpler to add other languages / output modes to PGo. This (in my opinion) is also important in research, as that relates to our ability to try things / ask questions without being (too) afraid of the necessary effort answering them might require :)

lemmy commented 3 years ago

the "eXtreme Modelling in practice" paper mentions they had trouble with the trace checking I'm suggesting here. Certainly, if one does not happen to be generating a large amount of the implementation, these issues would not be surprising. Will take their observations as caution, though I have faint hopes that they might apply only partially to PGo.

Generating testing scenarios from the models (without requiring specific tracing) was not something I initially considered. Good to think about, as an option.

From private conversations with the authors of that paper, most of the problems came from trying to capture the internal state of MongoDB. This didn't prove easy because of various locks at the implementation level. Recording traces at the network layer would likely have been less problematic.

Secondly, trace checking was actually super successful because it immediately discovered a bug. The problem was that it was a known bug, and there is no way to "blacklist" known bugs. Note that test case generation didn't find a single bug; not promising!

fhackett commented 3 years ago

Interesting, thanks for the context.

re: capturing internal state, that makes more precise what we might have to deal with, and reinforces the idea that we might have better luck, given our generative control over some aspects of concurrency. I've also had a fairly good experience with trace checking, albeit for teaching purposes, so the rest sounds promising.

re: test generation, also good to know. I didn't notice that detail.

DistCompiler / pgo

Compiler validation: provide an option for PGo to attempt compiling invariants into generated code #122