DistCompiler / pgo

PGo is a source to source compiler from Modular PlusCal specs into Go programs.
https://distcompiler.github.io/
Apache License 2.0
174 stars 14 forks source link

[feature] Pattern-based mapping macro application #103

Closed fhackett closed 5 years ago

fhackett commented 6 years ago

Often we might model the network as a collection of channels. This causes a problem, given that we currently don't have a way of automatically distributing mapping macros over collections. This means that we have to duplicate code, defining the same mapping macro for elements of a collection and for individual variables.

To avoid this, we can add one feature to MPCal: pattern-based macro application.

The semantics of this are that instead of applying a mapping macro to a variable x, you can apply it syntactically to references to the variable of the form x[y] for any y. In the corresponding mapping macro the term $value will then refer to x[y] rather than just x. This is limited to things that can go in an assignment LHS and can be applied in a purely syntactic manner, making a naive implementation very simple. The one thing to watch out for is that assignments to x on its own are much trickier since the mapping has to apply to every element. It may be simpler to just fail in those cases for now, but it is theoretically possible to implement such a thing if you manipulate raw TLA+ instead of PlusCal.

Here is an example spec with this applied. Look at the process instantiations at the bottom and work your way back to see what has happened. Notably, see that we did not need to define any mapping macros that deal with the entire network structure, netting us a lot of free boilerplate and conceptual simplicity compared to @rmascarenhas' original that does not use these features.

----------------------------- MODULE dqueue_mpcal -----------------------------
(***************************************************************************)
(* Distributed queue using Modular PlusCal.                                *)
(***************************************************************************)

EXTENDS Naturals, Sequences, TLC
CONSTANT BUFFER_SIZE

ProducerId == 0

(***************************************************************************
--mpcal DistQueue {
  mapping macro LossyWrites {
      read {
          await Len($variable) > 0;
          with (msg = Head($variable)) {
              $variable := Tail($variable);
              yield msg;
          };
      }
      write {
          either {
              yield $variable
          } or {
              await Len($variable) < BUFFER_SIZE;
              yield Append($variable, $value);
          };
      }
  }
  mapping macro CyclicReads {
      read  {
          $variable := ($variable + 1) % BUFFER_SIZE;
          yield $variable;
      }
  }
  (* consumer: Processes one element read from the network at a time, infinitely *)
  archetype AConsumer(ref network, ref processor) {
      c: while (TRUE) {
          (* request more data to the producer by sending your own identifier
             over the network *)
          c1: network[ProducerId] := self;

          (* processes the piece of data the producer sends back over the network
             by writing to a "processor" abstract interface *)
          c2: processor := network[self];
      }
  }

  archetype AProducer(ref network, stream)
  variable requester; {
      p: while (TRUE) {
          (* wait for a consumer to request data by sending on our channel *)
          p1:
          requester := network[self];

          (* send some data to the requester coming from a "stream" abstract interface *)
          p2: network[requester] := stream;
      }
  }

  variables network = [to in 0..NUM_CONSUMERS |->  <>],
            queueProcessor = 0,
            queueStream = 0;

  fair process (Consumer \in 1..NUM_CONSUMERS) == instance AConsumer(ref network, ref queueProcessor)
      mapping network[_] via LossyWrites;
  fair process (Producer = ProducerId) == instance AProducer(ref network, ref queueStream)
      mapping network[_] via LossyWrites; (* applies the mapping mecro to assignments like
                                                                     network[_] := ... *)
      mapping queueStream via CyclicReads;
}
***************************************************************************)
=============================================================================
\* Modification History
\* Last modified Fri Sep 28 16:49:58 PDT 2018 by rmc
\* Last modified Wed Oct 12 02:41:48 PDT 2011 by lamport
\* Created Mon Oct 10 06:26:47 PDT 2011 by lamport
fhackett commented 6 years ago

Here are some comments from our recent meeting in this context.

The read contradiction

Something novel that came up in the meeting was that this API cannot be mapping macro'd without [_] patterns. If you somehow made a mapping macro that operated on all of network at once there would be a contradiction:

P.S a simpler counter-argument is that since network[x] reads network then accesses [x] we don't know which x was read so effectively this maps to a bulk read of the entire network - probably not what the developer meant

Thoughts on generality

As I reread in the context of "is it general enough?" I thought of one thing that may not be immediately obvious.

The correctness of this pattern-based application assumes that each network[x] exists in its own critical section. Specifically, to operate on such a structure you should use PlusCal's control structures like if/while/either/etc rather than TLA+ shorthands.

Given that this whole thing is designed to make it easier to apply channel-like (aka breaking the variable semantics) mapping macros on a set of things I doubt this will cause many issues in the intended use case. If we did somehow support things like "function comprehensions" (i.e wholesale replacing of function values) it would express the same semantics that make channel-like mapping macros over data structures inconsistent without using [_] (or worse, in this case it might try to repeat a mutating read which will generate incorrect PlusCal). In essence, that kind of bulk operation is only meaningful for things that act like variables in any case.

Sharding the archetype interface in Go

As I think about this more, I realise that my sharding algorithm described in #99 will only work on variable-like variables (i.e not channels) for exactly the same reason that you can't make network work the way it does above with plain mapping macros. You still need to differentiate between reads and writes if you want to have a channel-like implementation of the archetype interface.

In effect, the archetype interface will need to be aware of indexing as well it seems (well, if we want to write things as above anyway).

So did breaking the variable abstraction break anything else?

I'm not sure. I don't think so, as long as you keep in mind the limitations regarding assigning to a variable v where v[_] has been mapped. At least in terms of nestable structures in TLA+, [_] is universal. a.b can be (and is) translated to a["b"] with no loss of generality.

There is clearly more to TLA+ than just finite functions (sets, at least) but only finite functions are assignable. Everything else is language-level immutable so I can't see channel semantics happening to anything else. If you do somehow have a mapping macro over a set or something then there is no reading/writing individual elements to confuse matters - it should remain sound to just send and receive the entire set (or pretend to at least), even if it's a channel of sets not a variable that holds a set.

rmascarenhas commented 5 years ago

Outdated discussion.