naterush commented 6 years ago

Issue

The codebase is not currently optimized for readability and extensibility. There are currently multiple classes that do the exact same work (or no work at all) like test langs, simulation runner, and protocol. In general, would be nice if we could crunch all of these into one - which would make the codebase much more understandable and comprehensible. Yay!

Proposed Implementation

Overview

For each protocol, create a XProtocol class (for example, BlockchainProtocol). Note that this is different than the current BlockchainProtocol class that exists today.

The protocol class is instantiated with a JSON object that specifies the execution of the entire protocol. Here's an example JSON object for a concurrent schedule replication:

{
    "protocol": "concurrent",
    "config": {
        "validators": [10, 11],
        "starting_outputs": [123, 1234, 12345],
        "genesis_estimate": [123],
        "select_outputs": "random",
        "create_outputs": "random"
    },
    "execution": {
                 "msg_per_round": 2,
        "execution_string": "M0-A M1-B"
    }
}

Obviously, each protocol would have a required set of fields, as well as validity conditions about the JSON object (reasonable things like len(initial_estimates) = len(validators)). See at end of post for spec of each JSON object. The goal is that the entire execution is totally deterministic from the JSON object.

This can then be run with protocol.execute(), which essentially consumes the execution_string, printing/making GIFs where appropriate.

Displayz

Each XProtocol class take three (optional) other than the setup string: interval, display, save. interval defaults to if not specified, and display and save both default to False.

The interval is measured in rounds where one round is msg_per_round number of messages being created. If display == True, interval == 3, and msg_per_round = 1, then a viewgraph will appear after every 3 messages are created. This makes it simple to display viewgraphs in cases like full message propagation: msg_per_round = len(validators).

Thus, depending on the message default being used, msg_per_round changes: (random, 1) (rrob, 1) (full, len(val_set)), etc.

Initial messages

Currently, there is some weirdness is how initial messages are created and dealt with. The new proposed method is essentially adding a method add_initial_messages to each view (or validator), and removing the instantiation of view with messages.

The reasoning for this is that a validator's view is created when the validator is, but some initial message schemes require all of the validators to be created before creating the initial messages, so we must treat these as two separate steps.

It looks like this:

validator_set = ValidatorSet(...)
initial_messages = self.get_initial_messages(validator_set)
for validator in validator_set:
   validator.set_initial_messages(initial_messages[validator])

It might make sense to push the set_initial_messages functions down to the view only, and not have the validator have access to this. Not sure what is best here.

The benefits to this are simple: each protocol can now do initial messages in its own way. Sharding has multiple genesis messages (one for each shard), and deals with this in a strange manner currently. This new approach would resolve this.

`execution_string`

Now, we must specify how to generate execution strings. This is a bit more complicated than it initially looks; consider how to handle different networks as we currently have.

However, I think we can essentially 'rip out' our existing message generation and networking stuff, to create some interesting ExecutionGenerator classes. Some of these could be RandomExecutionGenerator, RrobExecutionGenerator, etc (these are essentially the message generator classes that exist today.

Note: each of these classes has knowledge of how long a "round" is for itself. This is how we get the round number described above that is used when displaying (it can return this along w/ the execution string).

Furthermore, we must be able to specify networks that these classes can use to figure out when messages actually get delivered. However, this is fairly easy; because we can have the protocol string be generated round by round (some type of loop makes the most sense anyways), we can have messages, upon creation, check a delay function for how many rounds later they should be delivered.

For example (this is an example of full message propagation, but you get the idea)

execution_string = ''
deliver_on_round = dict()

for round in num_rounds:
   for creator in num_validators:
      message_name = self.get_new_message_name()
      new_message = 'M' + str(creator) + '-' + message_name + ' '
      execution_string += new_message

      for receiver in num_validators:
         delay = self.get_delay(round, creator, receiver) # include the round so delay can change over time

         if not deliver_on_round[round + delay]:
             deliver_on_round[round + delay] = []
          deliver_on_round[round + delay].append('S' + str(receiver) + '-' + message_name  + ' ')

     for send_message in deliver_on_round[round]:
        execution_string += send_message

Thus, we can carry over the same delay functions that we have currently to get the effect of the network!

Note that we can also store a max_round variable, and require that all messages are delivered before this max round.

Misc.

protocol.execute() takes an optional string. If you call protocol.execute(s) then protocol.execute(t) then you would be left at the state gotten to via json_string + s + t.

Conclusion

The benefits of this approach:

All classes mentioned crunch into one. Testing now means actually testing the protocols themselves.
Conceptually, it's a much cleaner translation of the problem...
It solves the issues: https://github.com/ethereum/cbc-casper/issues/165, https://github.com/ethereum/cbc-casper/issues/161, and maybe more?
Can easily do commands: venv/bin/python casper.py --execution network_split or other setups that have interesting situations...
can move this JSON object to other visualizations easier

JSON objects for different protocols

Note that throughout each of these protocols, reasonable rules apply to the execution strings: you can’t send messages before they exist, and the creator of a message must be a real validator.

Binary

{
    "protocol": “binary”,
    "config": {
        "validators": [10, 10],
        "initial_estimates": [0, 1],
    },
    "execution": {
                 "msg_per_round": 2,
        "execution_string": "M0-A M1-B"
    }
}

len(validators) == len(initial_estimates)
all initial_estimates must be 0 or 1

Integer

{
    "protocol": “integer”,
    "config": {
        "validators": [10, 11]
        "initial_estimates": [100, 6],
    },
    "execution": {
                 "msg_per_round": 2,
        "execution_string": "M0-A M1-B"
    }
}

len(validators) == len(initial_estimates)
all initial_estimates must be an integer

Order

{
    "protocol": “order”,
    "config": {
        "validators": [10, 11],
        "initial_estimates": [
            [“cow”, “pig”, “mouse”, “rat”],
            [“pig”, “cow”, “mouse”, “rat”]
        ],
    },
    "execution": {
                 "msg_per_round": 2,
        "execution_string": "M0-A M1-B"
    }
}

len(validators) == len(initial_estimates)
all initial_estimates must be a permutation of some list (when converting them to a set, must be equal - assuming no duplicates).

Blockchain

{
    "protocol": “blockchain”,
    "config": {
        "validators": [10, 11],
    },
    "execution": {
                 "msg_per_round": 2,
        "execution_string": "M0-A M1-B"
    }
}

Concurrent Schedule

{
    "protocol": "concurrent",
    "config": {
        "validators": [10, 11],
        "starting_outputs": [123, 1234, 12345],
        "genesis_estimate": [123],
        "select_outputs": "random",
        "create_outputs": "random"
    },
    "execution": {
                 "msg_per_round": 2,
        "execution_string": "M0-A M1-B"
    }
}

len(validators) == len(initial_estimates)
all initial_estimates must be a subset of the starting outputs.

Sharded blockchain

{
    "protocol": “sharded”,
    "config": {
        "validators": [10, 11],
        “num_shards”: 2,
        "val_select_shards”: [“random”, “random”]
    },
    "execution": {
                 "msg_per_round": 2,
        "execution_string": "M0-A M1-B"
    }
}

num_shards >= 2
num_shards <= len(validators)

In the future, we can add:

"validators_shards”: [
            [“”, “0”],
            [“”]
        ],

which a) must have shard ids for the number of shards, and b) has validators go through very specific state transitions. This can be used with some val_select_shards rules.

drewstone commented 6 years ago

How are you currently stepping through time? If there's a well-defined grid step, you could give each validator a validator_delay and at each step check if current_grid_step >= v.validator_delay. If so, poll its next message, etc.

Sample validator delays from some arbitrary distribution (uniform) over the [min,max] you want.

drewstone commented 6 years ago

You also (maybe?) could add the estimator.estimate logic into the each XView's class. Since it seems all you're doing in each view is importing that function defined in a separate file.

djrtwo commented 6 years ago

I think this is a good approach and agree that it will clean up the codebase and testing. Some initial thoughts. Sorry for the poor organization. Let's hop on a call sometime in the next couple of days.

Bigger design notes/questions:

I am worried about how execution_string is going to handle time stepping forward. I'm thinking we assume everything happens sequentially (nothing happens at exactly the same time), and that each space delimited "event" in the string counts as one time-step, rather than just each message generation. Thus our weak notion of time goes from "rounds" in the previous implementation to "events" in the protocol redesign.
- If our weak notion of time is just related to message generation, then I think we are going to run into difficulties simulating or at least reasoning about certain network conditions/validator strategies. For example, assume all validators are waiting to receive a message from validator A before they create a new message. Validator A makes the message and it has a delay of 3. Our notion of time has now stopped because validator-A's message is waiting on three other messages to be generated before it is delivered, and the other validator's are waiting on Validator-A's message before they generate a new message.
- If we use "events" to step time forward, we might need to model an "empty" event when generating the strings, but not necessarily when printing the strings. The empty events might be useful in creating the gifs (looks like everyone is waiting). As for creating the gifs, I think view_interval should treat every event as an interval step, rather than just the message generation.
protocol.execute adding strings -- I think it makes sense to be able to take a protocol from some state A, add the string s, and execute from state A via s to B. There are some corner cases to think through, namely - what happens if we haven't executed through the original string passed in via the JSON? Must we first execute the json string and then the additional string or just the additional string? I would think first execute the json string and anything added later is an extension of at least that state. If you do protocol.execute(s) then protocol.execute(t) then you would be left at the state gotten to via json_string + s + t

Minor notes:

setup --> config in the json file
specify random seed in the json file to make deterministic for random components like "select_outputs": "random". If not specified, should output the seed that was used or should default to a known seed -- 0.
At any point, should be able to ask the protocol questions like protocol.global_estimate
Might want to have a more granular option than just execute. Something like step. So that one could step through the protocol execution and query the protocol along the way. step would move forward one event (space delimited item from the string)
view_interval might just be interval or step_interval. It could correspond the number of events when a user steps. At the same time, if the user wanted a verbose output, that could be the interval at which the protocol execution (or higher level object) logs data (like global_estimate, etc)
initial messages -- view has add_initial_messages - does this take an argument of supplied messages which come from querying the protocol's get_initial_message for that particular validator?

drewstone commented 6 years ago

@naterush also correct me if I'm wrong but it looks like there is a single protocol view that all validators keep track of. This means that all validators always have the same exact view. Do you have plans to let each validator have a potentially unique view of the chain state?

djrtwo commented 6 years ago

@drewstone This is already implemented.

Network keeps track of a global_view that we use for gathering data about all messages, but each validator stores their own view in which they can only see messages that they have received and can only make judgements from such.

Sometimes the output/viewgraphs is just showing us info from the global view so we can get a global perspective on the network.

naterush commented 6 years ago

Thanks for the great feedback. I've updated the post above to consider most of these comments.

I think that a lot of the complexity described above was a result of not having rounds (which it turns out - though they seem unnecessary, are fairly necessary). As such, I've changed the above description to have rounds again; the key here is that we only really need them when a) creating the execution string, and b) calculating when to display (if we are displaying something). Reintroducing them seems to solve all the problems, but let me know if you see any other issues.

Also, most of the minor notes seem reasonable to me. I've justified some decisions (like the separate set_initial_messages function) in the original post.

naterush commented 6 years ago

There's one more thing I want to make sure we can handle, but I'm not exactly sure how to do it. Essentially: we need to control how initial messages are propagated.

For example: currently, we send all initial messages to everyone at the start of a binary protocol, but this leads to all messages have the same estimate for the rest of time (b/c everyone has seen everything), which isn't very interesting.

So we need more control over the execution over how these initial messages are propagated. There are a couple solutions here:

For some protocols, enforce that the execution strings starts with "M0-A M1-B M2-C..." as a validity condition (an error will be thrown if it doesn't start with this, for say, binary). A, B, C... are the names of the initial messages for each validator. The issue with this is that it the requires special cases in the protocol execution string generator.
Automatically name each initial message from the validators, but don't include them in the execution string. Note that this still requires special cases in the execution string generator string still (we have to send those initial messages still), but we lose the clarity of actually being able to see this in the execution string.
Do what we are doing currently, and send all initial messages to all validators before running the execution string. I personally don't like this solution b/c it leads to protocols like binary, list, and order all having super boring executions where everyone has the same estimate the whole time.

Personally, I think 1 is the cleanest solution. Any other thoughts/suggestions are greatly appreciated.

naterush commented 6 years ago

So I figured out a way to actually handle the special case in a general manner, without doing much extra work.

Essentially: the first time a validator 0 sends a message to validator 1, they send it in a justified manner (by including all the other blocks they sent as well).

djrtwo commented 6 years ago

Rounds in string generators with passing some info along with the execution string (length of round) is a good trade off :)

Notes:

you added "display" and "save" to json string but did not remove from XProtocol parameters. Why are we pushing these two vars to the json file? My gut is that they stay out and we just add "length_of_round" which is used during certain execution environments.
"length_of_round" needs to be part of the json file under "execution". I think this makes sense. If it is missing, it can default to 1 message per round.
"The interval is measured in message creation." should be "The interval is measured in rounds where rounds are specified by 'length of round' measured in messages." Or something like that
length_of_round and interval combine decently. We do lose some display info depending on the message gen scheme. If messages generated per round is fixed (like rrob or full) then no info is lost, but if validators are using more complex strategies and messages generated per round is variable, then the "length of round" probably becomes an average (or 1), and the complexity of time is lost. That said, defaulting to 1 on complex intervals is probably fine.

djrtwo commented 6 years ago

As for your fix on the sending of initial messages..

Are you proposing that when Val-0 (whose initial message was A) sends its first message (B) to Val-1, that both A and B are attempted to be propagated to Val-1? Are you proposing that:

we bundle these and have them delivered at once
or just that both are attempted to be sent (on potentially different delay schedules)

naterush commented 6 years ago

@djrtwo I am suggesting the first. Essentially, we insist that the first time Val-0 sends to Val-1 (say they are sending message B), we use SJ1-B rather than S1-B.

naterush commented 6 years ago

I think that sending in the length of the round is a nice solution - as well as leaving display and save out. I'll update the spec with this now.

Agree on defaulting to one - especially in the case of more complex strategies.

naterush commented 6 years ago

One final change: I'm proposing some changes to the testing language so it's more extensible for any possible future protocol executions we might want.

There are two base commands: M, S, where M is making a new message and S is sending an existing message. Furthermore, there is the secondary command SJ, which note, can be decomposed to the just S commands.

M-0-A means Val-0 make a new message called A. S-1-A means send Val-1 message A. SJ-1-A means send Val-1 message A, as well as all messages in A's justification.

Other validity conditions:

if a message is sent, it but have first been created.

Furthermore, we also add the following . The base make command is M-0-A (for example), but we also have M-0-A-(), where () can contain any arbitrary information about the new message.

For example, we could have M-0-A-(J-{B,C,D}), where this command essentially says create a message called A that just has the messages B, C, and D in its justification.

The regex for matching these commands is: ([A-Za-z]*)([-]*)([0-9]*)([-]*)([A-Za-z0-9]*)([-]*)([(){A-Za-z0-9,}]*)

djrtwo commented 6 years ago

Does that mean make block F forgetting that I made block E? Or is it making block F explicitly with E as estimate?

On Tuesday, March 20, 2018, Nate Rush notifications@github.com wrote:

Last thing I'd like to add to the specification before we move forward: keeping the language as general as possible... will post on this in a day.

Side note: just figured out how to do an equivocation: M0-F(E-{A, B, C, D}), where the second part essentially says that it's an equiocation, and gives a set of messages are are included in the justification of the new message.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ethereum/cbc-casper/issues/175#issuecomment-374465272, or mute the thread https://github.com/notifications/unsubscribe-auth/ABXf-0GWrrjx-OMQzkN3wS29pdEfd1foks5tgH2TgaJpZM4Smr4x .

naterush commented 6 years ago

@djrtwo updated with explanation of new commands.

naterush commented 6 years ago

Merged!

ethereum / cbc-casper

Restructure Codebase #175

Issue

Proposed Implementation

Overview

Displayz

Initial messages

`execution_string`

Misc.

Conclusion

JSON objects for different protocols

Binary

Integer

Order

Blockchain

Concurrent Schedule

Sharded blockchain