Integrating the generated main function with aggregate mapping macros (#103)

fhackett commented 5 years ago

Originally we would generate a Go main function that just defaulted all archetype parameters to some generic implementation.

With aggregate (#103 ) mapping macros we need a slightly more sophisticated system.

Currently our config looks like this (for the counter MPCal example):

{
  "build": {
    "output_dir": "gen",
    "dest_file": "out.go"
  },
  "networking": {
    "enabled": true,

    "state": {
      "strategy": "state-server",
      "peers": ["10.0.0.1:1111", "10.0.0.2:2222", "10.0.0.3:3333"]
    }
  },
  "constants": {
    "iters": "10",
    "procs": "3"
  }
}

This assumes that each process will communicate with others via a single connection, so each peer's instance of some variable can be accessed knowing only IP:port.

For our work-in-progress channel-like abstractions, many of these things no longer hold:

there may be more than one connection
emulating global variable semantics is in this case neither required nor really the point (though the original semantics are still accessible if you don't use the new features)

So, let's imagine we have an instantiation of some archetype that looks something like this:

fair process (Consumer \in 1..NUM_CONSUMERS) == instance AConsumer(ref network, ref queueProcessor)
    mapping network[_] via LossyWrites;
fair process (Producer = ProducerId) == instance AProducer(ref network, ref queueStream)
    mapping network[_] via LossyWrites;
    mapping queueStream via CyclicReads;

Now while in the model checking context we just rely on expanding the mapping macro, in the Go implementation we want to replace it with something. What that thing is is not a thing we can infer, since it resides in the model developer's understanding of what LossyWrites means in reality.

The other problem we have to contend with is in the form of implementation details: how many connections does networks represent, and between whom? It's convenient for the model checker to just pretend everyone is connected, but this breaks down when we have to actually set up TCP (or something like that). Just connecting everyone may be technically correct (and is an option I suppose as long as you tell PGo what's a channel and what's a global variable), but is probably not very convincing even if sold as a naive solution.

Perhaps the simplest approach to dealing with this is, like we already require the user to describe the IP:port pairs on the network, to require the user to describe the peer to peer connections involved.

Naively speaking, let's add something like this to the configuration:

"channels": {
    "Producer": {
        "network": {
            "ProducerId": "@self",
            "1": "Consumer(1)",
            "2": "Consumer(2)",
            "3": "Consumer(3)"
        },
        "queueStream": "@stdio"
    },
    "Consumer(1)": {
        "network": { "ProducerId": "Producer", "1": "@self" },
        "queueProcessor": "@stdio"
    },
    "Consumer(2)": {
        "network": { "ProducerId": "Producer", "2": "@self" },
        "queueProcessor": "@stdio"
    },
    "Consumer(3)": {
        "network": { "ProducerId": "Producer", "3": "@self" },
        "queueProcessor": "@stdio"
    }
}

This boils down to a mapping from process (that is an instance of an archetype) to a mapping from parameter name to some description of how to implement that parameter. Strings for single variables, (nested if necessary) dictionaries for aggregate maps. Let's for completeness/backwards compatibility that anything that isn't claimed to be a channel gets the default global variable treatment.

Notation:

@self means to accept connections
Producer or Consumer(2), etc mean to connect to that process
@stdio means to bind that thing to standard in and out (not central to this idea, but I wanted to put something for the non-network parameters since they are also an issue).

So, I'm sure there are flaws (and some embedded assumptions I'm still figuring out) in what I'm proposing but hopefully this gets the discussion going. Apologies this took me so long to post - it's been a long week (as usual these days it seems).

P.S you would also need to give a mapping from channel to port number wouldn't you - one port per archetype parameter (multiplied by level of nesting - 1 for nested aggregates)

fhackett commented 5 years ago

After some discussion, we decided that it is (for MVP at least) easier to implicitly make all processes connected. This re-uses a lot of what we already have, and with a little laziness in our implementation can behave equivalently to explicit configuration of different network topologies.

To summarise, here is what the config fragment might look like now:

"archetype-params": {
    "network": "channel",
    "queueStream": "stdio",
    "queueProcessor": "stdio"
}

The assumptions embedded here are:

all archetypes "see" global variables in the same way, i.e mapping macros are consistently applied
we will compute the domain of network and implement all of its values as (networked) channels. This can be done by injecting the compiled equivalent to the TLA+ expression DOMAIN network into the setup code.

rmascarenhas commented 5 years ago

We decided not to go this route.

DistCompiler / pgo

Integrating the generated main function with aggregate mapping macros (#103) #104