apalache-mc / apalache

APALACHE: symbolic model checker for TLA+ and Quint
https://apalache-mc.org/
Apache License 2.0
430 stars 40 forks source link

[BUG] Error by TLA+ parser: Unexpected structure of EXCEPT #286

Closed lemmy closed 3 years ago

lemmy commented 3 years ago
markus@avocado:~/src/TLA/_community/formal-methods-playground/tla/swim(master)$ /opt/apalache/bin/apalache-mc --cinit=ConstInit --init=Init --next=Next --length=50 --config=swim_stats_sim_v3_false_dead_pg1.cfg check  swim_stats.tla
# Tool home: /opt/apalache
# Package:   /opt/apalache/mod-distribution/target/apalache-pkg-0.7.2-RELEASE-full.jar
# JVM args: -Xmx4096m -Djava.library.path=/usr/lib/x86_64-linux-gnu/jni/  -DTLA-Library=
#
# APALACHE version 0.7.2-RELEASE build v0.7.1-5-g5b85083
#
# WARNING: This tool is in the experimental stage.
#          Please report bugs at: [https://github.com/informalsystems/apalache/issues]

Checker options: filename=swim_stats.tla, init=Init, next=Next, inv= I@17:19:50.167
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.google.inject.internal.cglib.core.$ReflectUtils$1 (file:/opt/apalache/mod-distribution/target/apalache-pkg-0.7.2-RELEASE-full.jar) to method java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain)
WARNING: Please consider reporting this to the maintainers of com.google.inject.internal.cglib.core.$ReflectUtils$1
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
PASS #0: SanyParser                                               I@17:19:50.552
Parsing file /home/markus/src/TLA/_community/formal-methods-playground/tla/swim/swim_stats.tla
Parsing file /tmp/Naturals.tla
Parsing file /tmp/FiniteSets.tla
Parsing file /home/markus/src/TLA/_community/formal-methods-playground/tla/swim/FiniteSetsExt.tla
Parsing file /tmp/Sequences.tla
Parsing file /tmp/TLC.tla
Parsing file /home/markus/src/TLA/_community/formal-methods-playground/tla/swim/TLCExt.tla
Parsing file /tmp/Integers.tla
Error by TLA+ parser: Unexpected structure of EXCEPT              E@17:19:51.061
It took me 0 days  0 hours  0 min  0 sec                          I@17:19:51.061
Total time: 0.956 sec                                             I@17:19:51.062
EXITCODE: ERROR (99)
-------------------------------- MODULE swim_stats --------------------------------

(*
Originally based on the Atomix SWIM TLA+ specification (https://github.com/atomix/atomix-tlaplus/blob/master/SWIM/SWIM.tla) 
but now heavily modified. The original spec was a subset of the SWIM protocol that focused on safety only, and only verifying that
the state a member has on its peer, for a given incarnation can only go from alive to suspect to dead.

This specification implements the variants 2 (SWIM+Inf.) and 3 (SWIM+Inf.+Susp.) of the SWIM paper. It can be run in TLC as v2 or v3
via the constants.

Summary of modifications:
- As per the SWIM paper:
    - Indirect probing via probe requests
    - Gossip messages (peer states) are piggybacked on probe, probe request and ack messages.
    - The number of gossiped peer states per probe/probeReq/ack is limited
    - When there are more infective peer states than can fit, those with the fewest disseminations are prioritised.
    - Suspected members are marked as dead after a timeout (number of protocol rounds). This can be deactivated to make it v2.
    - each member randomly picks another member to probe, but with guaranteed bounds 
      (i.e. cannot randomly pick the same member over and over again)

- For simulation/stats
    - The initial state can be configured such that some members start out dead, or ready to join, but
      that state is undetected by the rest of the members. 
      The aim is to measure the number of protocol rounds, and other metrics, in order for the ensemble 
      to converge on this new state. Shortly after reaching convergence, the spec will deadlock. 
      This is by design as it helps simulation by halting when we reach the objective. On deadlocking, 
      any statistical properties being tracked are printed out in a csv format.

    - Fair scheduling is modelled to ensure that the probe rate of each member is balanced. Each member 
      is always within 1 protocol round of each other

*)

EXTENDS Naturals, FiniteSets, FiniteSetsExt, Sequences, TLC, TLCExt, Integers

CONSTANTS Nil,                      \* Empty numeric value

          AliveState,               \* Numeric member state 
          SuspectState,             \* Numeric member state
          DeadState,                \* Numeric member state
          UnknownState,             \* Numeric member state

          ProbeMessage,             \* Message type: probe
          AckMessage,               \* Message type: ack
          ProbeRequestMessage,      \* Message type: probe request
          ForwardedAckMessage,      \* Message type: indirect ack forwarded

          \* CONFIGURABLE PARAMETERS
          Member,                   \* The set of possible members
          PeerGroupSize,            \* The number of peers to send probe requests when a direct probe fails
          SuspectTimeout,           \* The number of protocol rounds before a suspected node is made dead
          DisseminationLimit,       \* The lambda log n value (the maximum number of times a given update can be piggybacked)
          MaxUpdatesPerPiggyBack,   \* The maximum  number of state updates to be included in
                                    \* any given piggybacked gossip message
          MessageLossMode,          \* "none" = no message loss
                                    \* "probabilistic" randomly drops messages, based on LoseEveryNth 
                                    \* "exhaustive" chooses both (model checking mode)     
          LoseEveryNth,             \* Each message has a 1/n chance of being lost. For use in simulation.
          InitialContacts,          \* The number of members that newly added members know of
                                    \* used to bootstrap the new member into the cluster
          MemberLeavesEnabled,      \* TRUE or FALSE as to the MemberLeaves action is enabled
          NewMemberCount,           \* The number of new members the ensemble need to detect
          DeadMemberCount,          \* The number of dead members the ensemble need to detect
          UnjoinedMemberCount,      \* The number of members yet to join
          ForceRunToRound,          \* For use in simulation. When 0, the simulation will stop at convergence
                                    \* When > 0, the simulation will run to this number of rounds
          PrintStatsOnDeadlock      \* For use in simulation. TRUE/FALSE as to whether to print the obtained stats when the simulation ends.

\* Precedence of the different states
ASSUME AliveState > SuspectState /\ SuspectState > DeadState /\ DeadState > UnknownState

ASSUME DeadMemberCount \in Nat
ASSUME InitialContacts \in Nat
ASSUME SuspectTimeout \in Nat
ASSUME MaxUpdatesPerPiggyBack \in (Nat \ {0})
ASSUME DisseminationLimit \in (Nat \ {0})

ASSUME PrintStatsOnDeadlock \in BOOLEAN

ASSUME MessageLossMode \in {"none", "exhaustive", "probabilistic"}
ASSUME LoseEveryNth \in Nat
ASSUME ForceRunToRound \in Nat
ASSUME MemberLeavesEnabled \in BOOLEAN

ASSUME DeadMemberCount \in Nat
ASSUME NewMemberCount \in Nat
ASSUME UnjoinedMemberCount \in Nat

ASSUME DeadMemberCount < Cardinality(Member)
ASSUME NewMemberCount < Cardinality(Member)
ASSUME UnjoinedMemberCount < Cardinality(Member)

ASSUME DeadMemberCount + NewMemberCount + InitialContacts <= Cardinality(Member) - UnjoinedMemberCount

VARIABLES \* actual state in the protocol
          incarnation,          \* Member incarnation numbers
          peer_states,          \* The known state that each member has on its peers
          round,                \* Each member keeps track of which round of the protocol it is in
          pending_direct_ack,   \* Each member keeps track of members it is pending a direct ack from
          pending_indirect_ack, \* Each member keeps track of members it is pending an idirect ack from
          probe_ctr,            \* Each member keeps track of how many probes it has sent to each peer
                                \* It randomly selects peers, but keeps the number of probes balanced
          \* messaging passing
          messages,             \* a function of all messages

          \* for simulation and stats
          sim_status,
          initial_state_stats       

vars == <<incarnation, peer_states, messages, pending_direct_ack, pending_indirect_ack, probe_ctr, round, sim_status, initial_state_stats>>
(************************************************************************) 
(******************** SELECTING OUTGOING GOSSIP *************************)
(************************************************************************)

(* NOTES
Peer states are selected for piggybacking on probes, probe requests and acks based on:
1. The maximum number of times a peer state can be disseminated (lambda log n in the paper
   but in this spec the constant DisseminationLimit. When under this limit, the state is
   still infective.
2. The maximum number of peer states that can be piggybacked.
   In this spec the constant MaxUpdatesPerPiggyBack.
3. In the case that all valid peer states do not fit, prioritise those that have been 
   disseminated fewer times.
*)

\* For the given member, returns the peers it has state on that is still infective
PeersUnderDisseminationsLimit(member, peer, new_peer_states) ==
    { m1 \in Member : 
        /\ m1 # member
        /\ m1 # peer
        /\ new_peer_states[m1].disseminations < DisseminationLimit
    }

\* Choose the peer states to gossip based on the MaxUpdatesPerGossip and when there is more
\* gossip than can be accomodated in a single message, choose the peer states
\* in order of fewest disseminations first
Prioritise(candidate_peers, limit, new_peer_states) ==
    CHOOSE peer_subset \in kSubset(limit, candidate_peers): \* SUBSET candidate_peers : Cardinality(peer_subset) = limit 
        /\ \A peer1 \in peer_subset :
            \A peer2 \in candidate_peers \ peer_subset :
                new_peer_states[peer1].disseminations <= new_peer_states[peer2].disseminations

ShareableState(peer_state) ==
    [incarnation |-> peer_state.incarnation,
     state       |-> peer_state.state]

SelectOutgoingGossip(member, dest_peer, merged_peer_states) ==
    LET candidate_peers == TLCNoOp(PeersUnderDisseminationsLimit(member, dest_peer, merged_peer_states))
    IN
        IF Quantify(Member, LAMBDA m: m # member /\ m # dest_peer /\ merged_peer_states[m].disseminations < DisseminationLimit) <= MaxUpdatesPerPiggyBack
        \*Cardinality(candidate_peers) <= MaxUpdatesPerPiggyBack 
        THEN [peer \in candidate_peers |-> ShareableState(merged_peer_states[peer])]
        ELSE LET peers == Prioritise(candidate_peers, MaxUpdatesPerPiggyBack, merged_peer_states)
             IN [peer \in peers |-> ShareableState(merged_peer_states[peer])]

\* Increment the dissemination counter of the peer states that have been gossiped             
UpdatePeerStates(member, updated_peer_states, sent_peer_states) ==
    peer_states' = [peer_states EXCEPT ![member] =  
                        [peer \in Member |-> IF peer \in DOMAIN sent_peer_states
                                             THEN [updated_peer_states[peer] EXCEPT !.disseminations = @ + 1]
                                             ELSE updated_peer_states[peer]]]

(*****************************************************)
(*************** Messaging passing *******************)
(*****************************************************)

\* if the dest is dead, it never gets delivered
\* or if the dest thinks the source is dead, it ignores it (SWIM makes no provision for dead members playing an active role)
\* else if we have turned on message loss, then there are two modes:
\*  'probabilistic' is a random chance of losing the message
\*  'exhaustive' is for model checking where both options are explored
GetDeliveredCount(msg) ==
    IF incarnation[msg.dest] = Nil \/ peer_states[msg.dest][msg.source].state = DeadState 
    THEN {0}
    ELSE
        CASE MessageLossMode = "probabilistic" -> IF RandomElement(1..LoseEveryNth) = LoseEveryNth THEN {0} ELSE {1}
          [] MessageLossMode = "exhaustive"    -> {0,1}
          [] OTHER                             -> {1}

\* Send a message only if the message has not already been sent
SendMessage(msg) ==
    /\ msg \notin DOMAIN messages
    /\ \E delivered_count \in GetDeliveredCount(msg) :
        messages' = messages @@ (msg :> delivered_count)

\* Mark one message as processed and send a new message   
ProcessedOneAndSendAnother(received_msg, send_msg) ==
    /\ received_msg \in DOMAIN messages
    /\ send_msg \notin DOMAIN messages
    /\ messages[received_msg] >= 1
    /\ \E delivered_count \in GetDeliveredCount(send_msg) :
        messages' = [messages EXCEPT ![received_msg] = @-1] @@ (send_msg :> delivered_count)

\* Mark one message as processed   
MessageProcessed(msg) ==
    /\ msg \in DOMAIN messages
    /\ messages[msg] >= 1
    /\ messages' = [messages EXCEPT ![msg] = @ - 1]

\* Can only receive message if not received yet and the receipient is alive
\* Also SWIM paper makes no provision for communication from a peer a member considers to be dead already
\* so we ignore it.
CanReceiveMessage(msg, message_type) ==
    /\ msg.type = message_type
    /\ messages[msg] >= 1
    /\ incarnation[msg.dest] # Nil
    /\ peer_states[msg.dest][msg.source].state # DeadState

MessageIgnored ==
    \E msg \in DOMAIN messages : 
        /\ messages[msg] >= 1
        /\ \/ incarnation[msg.dest] = Nil
           \/ peer_states[msg.dest][msg.source].state = DeadState
        /\ messages' = [messages EXCEPT ![msg] = 0]
        /\ UNCHANGED << incarnation, peer_states, pending_direct_ack, pending_indirect_ack, 
                        probe_ctr, round, sim_status, initial_state_stats >>

\* When a member sends a probe to a peer, it needs to keep track of that
\* and additionally increment its probe_ctr (required for balanced probing)
RegisterPendingDirectAck(member, peer) ==
    /\ pending_direct_ack' = [pending_direct_ack EXCEPT ![member] = peer]
    /\ probe_ctr' = [probe_ctr EXCEPT ![member][peer] = @ + 1]

RegisterReceivedDirectAck(member) ==
    pending_direct_ack' = [pending_direct_ack EXCEPT ![member] = Nil]

(*****************************************************)
(*************** Helper operators ********************)
(*****************************************************)

UnknownStateRecord ==
    [incarnation    |-> 0, 
     state          |-> UnknownState, 
     round          |-> 0, 
     disseminations |-> DisseminationLimit]

KnownAliveStateRecord ==
    [incarnation    |-> 1, 
     state          |-> AliveState, 
     round          |-> 1, 
     disseminations |-> DisseminationLimit]

\* What is the highest protocol round of the ensemble
MaxRound ==
    LET highest == CHOOSE m1 \in Member :
        \A m2 \in Member: round[m1] >= round[m2]
    IN round[highest]

IncrementRound(m) ==
    round' = [round EXCEPT ![m] = @ + 1]

\* The real state being either dead or alive. The real state of a member 
\* cannot be "suspected".
RealStateOfMember(member, nil, dead_state, alive_state) ==
    IF incarnation[member] = nil THEN dead_state ELSE alive_state

\* TRUE when:
\* for each member, it's live peers all agree on the same thing, specifically that:
\*    - either they see the true state
\*    - or they (falsely) think that it is dead
\* SWIM does not prevent false positives (just mitigates them)
\* and makes no provision for coming back from the dead - once you a member thinks you're dead 
\* they cannot be convinced otherwise (deadness is monotonic).
IsConverged(inc, pstates, nil, dead_state, alive_state) ==
    \A member \in Member :
        \A peer \in Member :
            \/ inc[peer] = nil
            \/ member = peer
            \/ /\ member # peer
               /\ \/ pstates[peer][member].state = RealStateOfMember(member, nil, dead_state, alive_state)
                  \/ pstates[peer][member].state = DeadState                      

WillBeConverged ==
    IsConverged(incarnation, peer_states', Nil, DeadState, AliveState)

MemberThinksEveryoneIsDead(member, mem, pstates, dead_state) ==
    \A m1 \in mem : m1 = member \/ pstates[member][m1].state = dead_state    

AllMembersThinkEveryoneIsDead ==
    \A member \in Member: MemberThinksEveryoneIsDead(member, Member, peer_states, DeadState) 

StopConditionReached ==
    \/ sim_status > 0
    \/ IF ForceRunToRound > 0
       THEN
             IF \A member \in Member : round[member] <= ForceRunToRound 
             THEN IF AllMembersThinkEveryoneIsDead THEN TRUE ELSE FALSE
             ELSE TRUE
       ELSE
             IF IsConverged(incarnation, peer_states, Nil, DeadState, AliveState) 
             THEN TRUE 
             ELSE FALSE

\* Returns TRUE or FALSE as to whether an individual peer state is new for this member
\* It is new only if:
\* - its incarnation number is > than the known incarnation number of the target member
\*   and the member is not considered dead (once dead, there is no coming back according to SWIM paper)
\* - its incarnation number equals the known incarnation number of the target member but its state has higher precedence
IsNewInformation(member, peer, peer_state) ==
    \/ /\ peer_state.incarnation > peer_states[member][peer].incarnation
       /\ peer_states[member][peer].state # DeadState
    \/ /\ peer_state.incarnation = peer_states[member][peer].incarnation
       /\ peer_state.state < peer_states[member][peer].state

(************************************************************************) 
(******************** RECORD STATISTICS *********************************)
(************************************************************************)

(* Notes

Records various metrics using TLCSet and TLCGet. Some metrics are already available in the variables
such as round and messages.

At the end of a simulation, the statistics are gathered from the counters and variables and printed
in CSV format.
*)

updates_pr_ctr(r) ==
    (r * 100)

eff_updates_pr_ctr(r) ==
    (r * 100) + 1

suspect_ctr(r) ==
    (r * 100) + 2

suspect_states_ctr(r) ==
    (r * 100) + 3

dead_ctr(r) ==
    (r * 100) + 4

dead_states_ctr(r) ==
    (r * 100) + 5    

alive_ctr(r) ==
    (r * 100) + 6

alive_states_ctr(r) ==
    (r * 100) + 7

ResetStats(max_round) ==
    \A r \in 1..max_round :
        /\ TLCSet(updates_pr_ctr(r), 0)
        /\ TLCSet(eff_updates_pr_ctr(r), 0)
        /\ IF r = 1 
           THEN /\ TLCSet(alive_ctr(r), initial_state_stats.alive_members)
                /\ TLCSet(alive_states_ctr(r), initial_state_stats.alive_states)
           ELSE /\ TLCSet(alive_ctr(r), 0)
                /\ TLCSet(alive_states_ctr(r), 0)        
        /\ TLCSet(suspect_ctr(r), 0)
        /\ TLCSet(suspect_states_ctr(r), 0)
        /\ TLCSet(dead_ctr(r), 0)
        /\ TLCSet(dead_states_ctr(r), 0)

MemberCount(state, target_peer_states) ==
    Cardinality({dest \in Member : 
        \E source \in Member :
            target_peer_states[source][dest].state = state})

CurrentMemberCount(state) == MemberCount(state, peer_states)
NextStateMemberCount(state) == MemberCount(state, peer_states')  

StateCount(state, target_peer_states) ==
    LET pairs == { s \in SUBSET Member : Cardinality(s) = 2 }
    IN
    LET lower_to_higher == Cardinality({s \in pairs :
                                            \E m1, m2 \in s : 
                                                /\ target_peer_states[m1][m2].state = state
                                                /\ m1 < m2})
        higher_to_lower == Cardinality({s \in pairs :
                                            \E m1, m2 \in s : 
                                                /\ target_peer_states[m1][m2].state = state
                                                /\ m1 > m2})
    IN lower_to_higher + higher_to_lower

CurrentStateCount(state) == StateCount(state, peer_states)
NextStateStateCount(state) == StateCount(state, peer_states')

\* Does the step increment the round for one member such that now all members are on the same round?
IsNewRoundTransitionStep(inc, r1, r2, nil, mem, pstates, dead_state) ==
    /\ \E m \in mem : r1[m] # r2[m]
    /\ \/ Cardinality({m \in mem : ~MemberThinksEveryoneIsDead(m, mem, pstates, dead_state)}) = 1
       \/ /\ \E m1, m2 \in mem: /\ inc[m1] # nil 
                                /\ inc[m2] # nil
                                /\ ~MemberThinksEveryoneIsDead(m1, mem, pstates, dead_state)
                                /\ ~MemberThinksEveryoneIsDead(m2, mem, pstates, dead_state)
                                /\ r1[m1] # r1[m2] 
          /\ \A m3, m4 \in mem: 
            \/ inc[m3] = Nil
            \/ inc[m4] = Nil
            \/ MemberThinksEveryoneIsDead(m3, mem, pstates, dead_state)
            \/ MemberThinksEveryoneIsDead(m4, mem, pstates, dead_state)
            \/ /\ inc[m3] # nil 
               /\ inc[m4] # nil 
               /\ ~MemberThinksEveryoneIsDead(m3, mem, pstates, dead_state)
               /\ ~MemberThinksEveryoneIsDead(m4, mem, pstates, dead_state)
               /\ r2[m3] = r2[m4]

MayBeRecordMemberCounts ==
    \* Is this is a step that leads to all members being on the same round then record the member count stats
        IF  IsNewRoundTransitionStep(incarnation, round, round', Nil, Member, peer_states, DeadState)
        THEN
            LET r == MaxRound
            IN
                /\ TLCSet(suspect_ctr(r), NextStateMemberCount(SuspectState))
                /\ TLCSet(dead_ctr(r), NextStateMemberCount(DeadState))
                /\ TLCSet(alive_ctr(r), NextStateMemberCount(AliveState))
                /\ TLCSet(suspect_states_ctr(r), NextStateStateCount(SuspectState))
                /\ TLCSet(dead_states_ctr(r), NextStateStateCount(DeadState))
                /\ TLCSet(alive_states_ctr(r), NextStateStateCount(AliveState))
        ELSE TRUE

RecordIncomingGossipStats(member, gossip_source, incoming_peer_states, end_of_round) ==
    LET updates_ctr_id     == updates_pr_ctr(round[gossip_source])
        eff_updates_ctr_id == eff_updates_pr_ctr(round[gossip_source])
    IN 
       \* gossip counts
       /\ TLCSet(updates_ctr_id, TLCGet(updates_ctr_id) + Cardinality(DOMAIN incoming_peer_states))
       \* effective gossip counts
       /\ LET effective_count == Quantify(DOMAIN incoming_peer_states, 
                                          LAMBDA m : IsNewInformation(member, m, incoming_peer_states[m])) 
          IN TLCSet(eff_updates_ctr_id, TLCGet(eff_updates_ctr_id) + effective_count)
       \* suspect and dead counts
       /\ IF end_of_round THEN MayBeRecordMemberCounts ELSE TRUE         

IsMessageOfMemberInRound(msg, member, r) ==
    /\ \/ msg.source = member
       \/ msg.dest = member
    /\ msg.round = r

MessageLoad(r, member) ==
    Quantify(DOMAIN messages, LAMBDA msg : IsMessageOfMemberInRound(msg, member, r))

IsIncomingMessageOfMemberInRound(msg, member, r) ==
    /\ msg.dest = member
    /\ msg.round = r

IncomingMessageLoad(member, r) ==
    Quantify(DOMAIN messages, LAMBDA msg : IsIncomingMessageOfMemberInRound(msg, member, r))

IsOutgoingMessageOfMemberInRound(msg, member, r) ==
    /\ msg.dest = member
    /\ msg.round = r

OutgoingMessageLoad(r, member) ==
    Quantify(DOMAIN messages, LAMBDA msg : IsOutgoingMessageOfMemberInRound(msg, member, r))

PrintStats ==
    /\ LET max_stats_round == MaxRound
           cfg_str == "," \o ToString(Cardinality(Member)) 
                        \o "," \o ToString(DeadMemberCount)
                        \o "," \o ToString(NewMemberCount)
                        \o "," \o ToString(SuspectTimeout)
                        \o "," \o ToString(DisseminationLimit)
                        \o "," \o ToString(MaxUpdatesPerPiggyBack)
                        \o "," \o ToString(LoseEveryNth)
                        \o "," \o ToString(PeerGroupSize)
                        \o "," \o ToString(InitialContacts)
                        \o "," \o ToString(MaxRound)
                        \o ","
       IN
        /\ PrintT("rounds" \o cfg_str \o ToString(max_stats_round))
        /\ \A r \in 1..max_stats_round : 
            \A member \in Member : 
                IF incarnation[member] # Nil
                THEN PrintT("message_load_in_round" \o cfg_str \o ToString(r) \o "," \o ToString(member) \o "," \o ToString(MessageLoad(r, member)))
                ELSE TRUE
        (*/\ \A r \in 1..max_stats_round : 
            \A member \in Member : 
                IF incarnation[member] # Nil
                THEN PrintT("incoming_message_load_in_round" \o cfg_str \o ToString(r) \o "," \o ToString(member) \o "," \o ToString(IncomingMessageLoad(member, r)))
                ELSE TRUE
        /\ \A r \in 1..max_stats_round : 
            \A member \in Member : 
                IF incarnation[member] # Nil
                THEN PrintT("outgoing_message_load_in_round" \o cfg_str \o ToString(r) \o "," \o ToString(member) \o "," \o ToString(OutgoingMessageLoad(member, r)))
                ELSE TRUE*)
        /\ \A r \in 1..max_stats_round : PrintT("updates_in_round" \o cfg_str \o ToString(r) \o "," \o ToString(TLCGet(updates_pr_ctr(r))))
        /\ \A r \in 1..max_stats_round : PrintT("eff_updates_in_round" \o cfg_str \o ToString(r) \o "," \o ToString(TLCGet(eff_updates_pr_ctr(r))))
        /\ \A r \in 1..max_stats_round : 
            IF r = max_stats_round 
            THEN PrintT("alive_members_count" \o cfg_str \o ToString(r) \o "," \o ToString(CurrentMemberCount(AliveState)))
            ELSE PrintT("alive_members_count" \o cfg_str \o ToString(r) \o "," \o ToString(TLCGet(alive_ctr(r))))
        /\ \A r \in 1..max_stats_round : 
            IF r = max_stats_round 
            THEN PrintT("suspected_members_count" \o cfg_str \o ToString(r) \o "," \o ToString(CurrentMemberCount(SuspectState)))
            ELSE PrintT("suspected_members_count" \o cfg_str \o ToString(r) \o "," \o ToString(TLCGet(suspect_ctr(r))))
        /\ \A r \in 1..max_stats_round :
            IF r = max_stats_round 
            THEN PrintT("dead_members_count" \o cfg_str \o ToString(r) \o "," \o ToString(CurrentMemberCount(DeadState)))
            ELSE PrintT("dead_members_count" \o cfg_str \o ToString(r) \o "," \o ToString(TLCGet(dead_ctr(r))))
        /\ \A r \in 1..max_stats_round :
            IF r = max_stats_round 
            THEN PrintT("alive_states_count" \o cfg_str \o ToString(r) \o "," \o ToString(CurrentStateCount(AliveState)))
            ELSE PrintT("alive_states_count" \o cfg_str \o ToString(r) \o "," \o ToString(TLCGet(alive_states_ctr(r))))
        /\ \A r \in 1..max_stats_round : 
            IF r = max_stats_round 
            THEN PrintT("suspect_states_count" \o cfg_str \o ToString(r) \o "," \o ToString(CurrentStateCount(SuspectState)))
            ELSE PrintT("suspect_states_count" \o cfg_str \o ToString(r) \o "," \o ToString(TLCGet(suspect_states_ctr(r))))
        /\ \A r \in 1..max_stats_round :
            IF r = max_stats_round 
            THEN PrintT("dead_states_count" \o cfg_str \o ToString(r) \o "," \o ToString(CurrentStateCount(DeadState)))
            ELSE PrintT("dead_states_count" \o cfg_str \o ToString(r) \o "," \o ToString(TLCGet(dead_states_ctr(r))))
        /\ PrintT("converged")
        /\ TLCDefer(ResetStats(max_stats_round))

EndSim ==
    /\ StopConditionReached
    /\ sim_status # 1
    /\ IF PrintStatsOnDeadlock = TRUE THEN PrintStats ELSE TRUE
    /\ sim_status' = 1
    /\ UNCHANGED <<incarnation, peer_states, messages, pending_direct_ack, pending_indirect_ack, probe_ctr, round, initial_state_stats>>

RecordGossipStats(member, gossip_source, incoming_states, end_of_round) ==
    TLCDefer(RecordIncomingGossipStats(member, gossip_source, incoming_states, end_of_round))

RecordStateStats ==
    TLCDefer(MayBeRecordMemberCounts)

(************************************************************************) 
(******************** INCOMING GOSSIP ***********************************)
(************************************************************************)

(* NOTES
Gossip can come in on probes, probe requests and acks. Any incoming gossip is 
merged with existing peer state, with stale peer states being replaced by any newer gossiped state. 
Stale peer state is that which has a lower incarnation number than the gossiped state or 
which has the same incarnation number, but a lower precedence state. 
The precedence order is (highest to lowest) Dead, Suspect, Alive.
*)

ResetCountersOfRecord(record, member_round) ==
    [incarnation    |-> record.incarnation,
     state          |-> record.state,
     disseminations |-> 0,
     round          |-> member_round]

NewAliveMemberRecord(incarnation_number, member_round) ==
    [incarnation    |-> incarnation_number, 
     state          |-> AliveState, 
     disseminations |-> 0, 
     round          |-> member_round]

MergeGossipWithCurrentState(member, gossip_source, source_incarnation, incoming_peer_states) ==
    [peer \in Member |-> 
        IF peer = member 
        THEN peer_states[member][member] \* it holds no data about itself 
        ELSE 
            IF peer = gossip_source
            THEN IF peer_states[member][gossip_source].incarnation < source_incarnation
                 THEN NewAliveMemberRecord(source_incarnation, round[member])
                 ELSE peer_states[member][gossip_source]
            ELSE
                IF peer \in DOMAIN incoming_peer_states
                THEN IF IsNewInformation(member, peer, incoming_peer_states[peer])
                     THEN ResetCountersOfRecord(incoming_peer_states[peer], round[member])
                     ELSE peer_states[member][peer]
                ELSE peer_states[member][peer]]

\* Updates the state and counters of a peer of the given member
\* This is not called when an incarnation has changed, only a state change 
\* like Alive->Suspect or Suspect->Dead
\* When configured as SWIM v2, we skip suspect state and go straight to dead.
UpdateState(member, peer, state) ==
    LET new_state == IF state = SuspectState /\ SuspectTimeout = 0 THEN DeadState ELSE state
    IN
        peer_states' = [peer_states EXCEPT ![member][peer] =
                            [@ EXCEPT !.state          = new_state,
                                      !.disseminations = 0,
                                      !.round          = round[member]]]

MayBeIncrementIncarnation(msg) ==
    IF msg.dest_state.state = SuspectState /\ msg.dest_state.incarnation = incarnation[msg.dest]
    THEN incarnation[msg.dest] + 1 
    ELSE incarnation[msg.dest]

MayBeUpdateIncarnation(member, new_incarnation) ==
    IF incarnation[member] < new_incarnation
    THEN incarnation' = [incarnation EXCEPT ![member] = new_incarnation]
    ELSE UNCHANGED incarnation

(************************************************************************) 
(******************** ACTION: SendDirectProbe****************************)
(************************************************************************)

(* Notes
Triggers a probe request from a member to a peer and is the beginning of a new protocol round
for that member.

A new protocol round can only commence when the prior one is complete. That can occur when:
- there are no pending expirations (suspect to dead) to apply
- there are no pending acks to direct probes
- there are no pending indirect acks via probe requests that have a chance of completing
  (message loss and dead members can cause there to be no reply from any probe requests sent out). 
*)

\* The current round is complete only if:
\* - The member has no pending expirations
\* - The member is not waiting on a direct ack
\* - The member is not waiting on an indirect ack
CurrentRoundComplete(member) ==
    /\ ~\E peer \in Member :
        /\ peer_states[member][peer].state = SuspectState
        /\ round[member] - peer_states[member][peer].round > SuspectTimeout
    /\ pending_indirect_ack[member] = Nil   
    /\ pending_direct_ack[member] = Nil

\* The member knows of the peer's existence and it is not pending expiry
IsValidProbeTarget(member, peer) ==
    /\ member # peer
    /\ \/ peer_states[member][peer].state = AliveState       
       \/ /\ peer_states[member][peer].state = SuspectState
          /\ round[member] - peer_states[member][peer].round <= SuspectTimeout

\* Peers picked in a random but time bounded manner
IsValidRandomPick(member, peer) ==
    \A m \in Member : 
        IF IsValidProbeTarget(member, m) 
        THEN probe_ctr[member][peer] <= probe_ctr[member][m]
        ELSE TRUE

\* This is necessary for statistics simulation to ensure that all members are
\* sending out probes at the same rate and stay within one round of each other
\* It tests that the member is on an equal or lower round than all its peers that 
\* are both alive and who themselves still have peers they believe are alive (if a member
\* thinks everyone else is dead, then it cannot take part in further rounds and
\* is therefore not included in fair scheduling)
\* It also ensures there are no outstanding messages regarding indirect probing for this member
IsFairlyScheduled(member, peer) ==
    /\ \A m1 \in Member : 
        \/ incarnation[m1] = Nil
        \/ \A m2 \in Member : m1 = m2 \/ peer_states[m1][m2].state = DeadState
        \/ /\ incarnation[m1] # Nil
           /\ round[member] <= round[m1]
    /\ ~\E msg \in DOMAIN messages :
        /\ msg.on_behalf_of = member
        /\ messages[msg] > 0  

\* The sending of a direct probe is the beginning of a new protocol period.
SendDirectProbe(member, peer) ==
    /\ ~StopConditionReached
    /\ member # peer
    /\ incarnation[member] # Nil            \* The member is alive
    /\ CurrentRoundComplete(member)         \* This member can initiate its next protocol round
    /\ IsValidProbeTarget(member, peer)     \* The peer is valid (the member think it's not dead for example)
    /\ IsValidRandomPick(member, peer)      \* Peers are picked randomly, but in a balanced way
    /\ IsFairlyScheduled(member, peer)      \* For simulation - we keep all members within 1 round of each other
    /\ LET gossip_to_send == SelectOutgoingGossip(member, peer, peer_states[member])
       IN
        /\ SendMessage([type         |-> ProbeMessage,
                        source       |-> member,
                        dest         |-> peer,
                        round        |-> round[member],
                        on_behalf_of |-> Nil,
                        dest_state   |-> ShareableState(peer_states[member][peer]),
                        source_inc   |-> incarnation[member],
                        gossip       |-> gossip_to_send])
        /\ UpdatePeerStates(member, peer_states[member], gossip_to_send)
        /\ RegisterPendingDirectAck(member, peer)
        /\ UNCHANGED <<incarnation, pending_indirect_ack, round, sim_status, initial_state_stats >>

(************************************************************************) 
(******************** ACTION: ReceiveProbe ******************************)
(************************************************************************)

(* Notes
Handles a probe message from a peer, either a direct or an indirect probe.
If the received incarnation is greater than the destination's incarnation number, update the
destination's incarnation number to 1 plus the received number. This can happen after a node
leaves and rejoins the cluster. If the destination is suspected by the source, increment the
destination's incarnation and respond with the updated incarnation, this will tell the source
that the destination is in fact alive. 
If the destination's incarnation is greater than the source's incarnation, just send an ack.

Adds pending gossip (that will fit) to the ack (piggybacking).
Adds any incoming gossip that is new, to the peer state to be gossiped later.
*)

\* Send an ack and piggyback gossip if any to send
SendAck(probe, source_incarnation, merged_peer_state, piggyback_gossip) ==
    ProcessedOneAndSendAnother(probe, 
                       [type         |-> AckMessage,
                        source       |-> probe.dest,
                        dest         |-> probe.source,
                        round        |-> probe.round,
                        on_behalf_of |-> probe.on_behalf_of,
                        dest_state   |-> ShareableState(merged_peer_state[probe.source]),
                        source_inc   |-> source_incarnation,
                        gossip       |-> piggyback_gossip])

ReceiveProbe ==
    /\ ~StopConditionReached
    /\ \E msg \in DOMAIN messages :
        /\ CanReceiveMessage(msg, ProbeMessage)
        /\ LET merged_peer_state == MergeGossipWithCurrentState(msg.dest, 
                                                              msg.source, 
                                                              msg.source_inc, 
                                                              msg.gossip)
               new_incarnation   == MayBeIncrementIncarnation(msg)
           IN LET send_gossip == SelectOutgoingGossip(msg.dest, msg.source, merged_peer_state)
              IN
               /\ MayBeUpdateIncarnation(msg.dest, new_incarnation)
               /\ SendAck(msg, new_incarnation, merged_peer_state, send_gossip)
               /\ UpdatePeerStates(msg.dest, merged_peer_state, send_gossip)
               /\ RecordGossipStats(msg.dest, msg.source, msg.gossip, FALSE)
    /\ UNCHANGED <<round, pending_direct_ack, pending_indirect_ack, probe_ctr, sim_status, initial_state_stats >>

(************************************************************************) 
(******************** ACTION: ReceiveAck ********************************)
(************************************************************************)

(* Notes
Handles an ack message (from a direct probe only) from a peer.
Increments this member's round as this is the end of a protocol round 
*)
ReceiveAck ==
    /\ ~StopConditionReached
    /\ \E msg \in DOMAIN messages :
        /\ CanReceiveMessage(msg, AckMessage)
        /\ msg.on_behalf_of = Nil \* this is a direct probe
        /\ msg.round = round[msg.dest]
        /\ LET merged_peer_state == MergeGossipWithCurrentState(msg.dest, 
                                                                msg.source, 
                                                                msg.source_inc, 
                                                                msg.gossip)
               new_incarnation   == MayBeIncrementIncarnation(msg)
           IN
            /\ MayBeUpdateIncarnation(msg.dest, new_incarnation)
            /\ UpdatePeerStates(msg.dest, merged_peer_state, <<>>)
            /\ RegisterReceivedDirectAck(msg.dest)
            /\ MessageProcessed(msg)
            /\ IncrementRound(msg.dest)
            /\ RecordGossipStats(msg.dest, msg.source, msg.gossip, TRUE)
            /\ UNCHANGED << pending_indirect_ack, probe_ctr, sim_status, initial_state_stats>>

(************************************************************************) 
(******************** ACTION: Expire ************************************)
(************************************************************************)

(* Notes
Expires a suspected peer once it has reached the timeout.
We can run as variant 2 by setting the timeout to 0 which means
immediate expiry.
*)
Expire(member, peer) ==
    /\ ~StopConditionReached
    /\ member # peer
    /\ peer_states[member][peer].state = SuspectState
    /\ (round[member] - peer_states[member][peer].round) > SuspectTimeout
    /\ UpdateState(member, peer, DeadState)
    /\ UNCHANGED <<incarnation, messages, pending_direct_ack, pending_indirect_ack, 
                   probe_ctr, round, sim_status, initial_state_stats >>

(************************************************************************) 
(******************** ACTION: SendProbeRequest **************************)
(************************************************************************)

(* Notes

Enabled when a direct probe has failed, there are still probe requests to
be sent out and valid peers to receive them. 

The number of probe requests is bounded by the minimum of:
- The PeerGroupSize value, this is the number of peers a member will send probe requests to
- The number of available peers to send to
*)

NewProbeRequest(member, peer, failed_peer, gossip_to_send) ==
    [type         |-> ProbeRequestMessage,
     source       |-> member,
     dest         |-> peer,
     target       |-> failed_peer,
     on_behalf_of |-> Nil,
     dest_state   |-> ShareableState(peer_states[member][peer]),
     source_inc   |-> incarnation[member],
     round        |-> round[member],
     gossip       |-> gossip_to_send]

IsPreviouslySentPR(msg, source, r) ==
    /\ msg.type = ProbeRequestMessage
    /\ msg.source = source
    /\ msg.round = r

EligibleProbeRequestPeer(member, peer, failed_peer, r) ==
    /\ peer # member
    /\ peer # failed_peer
    /\ IsValidProbeTarget(member, peer)
    /\ ~\E msg \in DOMAIN messages : 
            /\ IsPreviouslySentPR(msg, member, r)
            /\ msg.dest = peer 

SendOneProbeRequest(failed_probe) == 
    LET member      == failed_probe.source
        failed_peer == failed_probe.dest
    IN
        \E peer \in Member : 
            /\ EligibleProbeRequestPeer(member, peer, failed_peer, failed_probe.round)
            /\ LET gossip_to_send == SelectOutgoingGossip(member, peer, peer_states[member])
               IN
                    /\ SendMessage(NewProbeRequest(member, peer, failed_peer, gossip_to_send))
                    /\ UpdatePeerStates(member, peer_states[member], gossip_to_send)
                    /\ pending_indirect_ack' = [pending_indirect_ack EXCEPT ![member] = failed_peer]
                    /\ pending_direct_ack' = [pending_direct_ack EXCEPT ![member] = Nil]

DirectProbeFailed(msg) ==
    /\ msg.type = ProbeMessage
    /\ messages[msg] <= 0
    /\ msg.round = round[msg.source]
    /\ msg.on_behalf_of = Nil
    /\ ~\E ack \in DOMAIN messages : 
        /\ ack.type = AckMessage
        /\ ack.dest = msg.source
        /\ ack.round = msg.round
        /\ messages[ack] >= 1

SendProbeRequest ==
    /\ ~StopConditionReached
    /\ \E msg \in DOMAIN messages :
        /\ DirectProbeFailed(msg)
        /\ Quantify(DOMAIN messages, LAMBDA prev_msg : IsPreviouslySentPR(prev_msg, msg.source, msg.round)) < PeerGroupSize
        /\ SendOneProbeRequest(msg)
        /\ UNCHANGED <<incarnation, probe_ctr, round, sim_status, initial_state_stats>>

(************************************************************************) 
(******************** ACTION: NoPeersForProbeRequest ********************)
(************************************************************************)

(* Notes
This is enabled when we haven't sent out any probe requests and 
there are no valid peers to send to.
*)

NoEligiblePeersForProbeRequest(failed_probe) == 
    LET member      == failed_probe.source
        failed_peer == failed_probe.dest
    IN
        \/ Quantify(DOMAIN messages, LAMBDA msg : IsPreviouslySentPR(msg, failed_probe.source, failed_probe.round)) >= PeerGroupSize
        \/ ~\E peer \in Member : 
            EligibleProbeRequestPeer(member, peer, failed_peer, failed_probe.round)

ProbeRequestsSent(member) ==
    \E msg \in DOMAIN messages :
        /\ msg.type = ProbeRequestMessage
        /\ msg.round = round[member]
        /\ msg.source = member

NoPeersForProbeRequest ==
    /\ ~StopConditionReached
    /\ \E msg \in DOMAIN messages :
        /\ DirectProbeFailed(msg)
        /\ ~ProbeRequestsSent(msg.source)
        /\ NoEligiblePeersForProbeRequest(msg)
        /\ IF peer_states[msg.source][msg.dest].state = AliveState
           THEN UpdateState(msg.source, msg.dest, SuspectState)
           ELSE UNCHANGED peer_states
        /\ IncrementRound(msg.source)
        /\ RecordStateStats
        /\ pending_direct_ack' = [pending_direct_ack EXCEPT ![msg.source] = Nil]
        /\ UNCHANGED <<incarnation, messages, probe_ctr, pending_indirect_ack, sim_status, initial_state_stats>>    

(************************************************************************) 
(******************** ACTION: ReceiveProbeRequest ***********************)
(************************************************************************)

(* Notes
Receives a probe request and then sends a probe to the target peer.
*)

ReceiveProbeRequest ==
    /\ ~StopConditionReached
    /\ \E msg \in DOMAIN messages :
        /\ CanReceiveMessage(msg, ProbeRequestMessage)
        /\ LET merged_peer_state == MergeGossipWithCurrentState(msg.dest, 
                                                                msg.source, 
                                                                msg.source_inc, 
                                                                msg.gossip)
               new_incarnation   == MayBeIncrementIncarnation(msg)
           IN LET gossip_to_send == SelectOutgoingGossip(msg.dest, msg.target, merged_peer_state)
              IN
                /\ MayBeUpdateIncarnation(msg.dest, new_incarnation)
                /\ ProcessedOneAndSendAnother(msg,
                               [type         |-> ProbeMessage,
                                source       |-> msg.dest,
                                dest         |-> msg.target,
                                round        |-> msg.round,
                                on_behalf_of |-> msg.source,
                                dest_state   |-> ShareableState(merged_peer_state[msg.target]),
                                source_inc   |-> new_incarnation,
                                gossip       |-> gossip_to_send])
                /\ UpdatePeerStates(msg.dest, merged_peer_state, gossip_to_send)
                /\ RecordGossipStats(msg.dest, msg.source, msg.gossip, FALSE)
    /\ UNCHANGED << pending_direct_ack, pending_indirect_ack, probe_ctr, round, sim_status, initial_state_stats >>

(************************************************************************) 
(******************** ACTION: ReceiveProbeRequestAck ********************)
(************************************************************************)

(*
This is only enabled when the target of the original direct probe is in fact
alive. If the target really were dead, it would not respond to the indirect probe.

The ack from the target peer is forwarded back to the member that sent the probe
request.
*)

ForwardedAck(msg, merged_peer_state, gossip_to_send) ==
    [type         |-> ForwardedAckMessage,
     source       |-> msg.dest,
     dest         |-> msg.on_behalf_of,
     on_behalf_of |-> msg.on_behalf_of,
     target       |-> msg.source,
     dest_state   |-> ShareableState(merged_peer_state[msg.on_behalf_of]),
     source_inc   |-> incarnation[msg.dest],
     round        |-> msg.round,
     gossip       |-> gossip_to_send]

ReceiveProbeRequestAck ==
    /\ ~StopConditionReached
    /\ \E msg \in DOMAIN messages :
        /\ CanReceiveMessage(msg, AckMessage)
        /\ msg.on_behalf_of # Nil
        /\ LET merged_peer_state == MergeGossipWithCurrentState(msg.dest, 
                                                                msg.source, 
                                                                msg.source_inc, 
                                                                msg.gossip)
               new_incarnation   == MayBeIncrementIncarnation(msg)
           IN LET send_gossip == SelectOutgoingGossip(msg.dest, msg.on_behalf_of, merged_peer_state)
              IN
                /\ MayBeUpdateIncarnation(msg.dest, new_incarnation)
                /\ UpdatePeerStates(msg.dest, merged_peer_state, send_gossip)
                /\ RecordGossipStats(msg.dest, msg.source, msg.gossip, FALSE)
                /\ ProcessedOneAndSendAnother(msg, ForwardedAck(msg, merged_peer_state, send_gossip))
                /\ UNCHANGED << pending_direct_ack, pending_indirect_ack, probe_ctr, round, sim_status, initial_state_stats>>

(************************************************************************) 
(******************** ACTION: ReceiveForwardedAck ***********************)
(************************************************************************)

(* Notes
This member earlier sent out a direct probe, which failed (not responded to) and
so had sent out up to PeerGroupSize probe requests. If the original target peer
is in fact alive, then the member may receive multiple forwarded acks. On the
first such message that is received, it untracks the target as pending an indirect ack
and increments its own round as this is the end of a protocol round for this member.
*)

ReceiveForwardedAck ==
    /\ ~StopConditionReached
    /\ \E msg \in DOMAIN messages :
        /\ CanReceiveMessage(msg, ForwardedAckMessage)
        /\ LET merged_peer_state == MergeGossipWithCurrentState(msg.dest, 
                                                                msg.source, 
                                                                msg.source_inc, 
                                                                msg.gossip)
               new_incarnation   == MayBeIncrementIncarnation(msg)
           IN
            /\ MayBeUpdateIncarnation(msg.dest, new_incarnation)
            /\ UpdatePeerStates(msg.dest, merged_peer_state, <<>>)
            /\ IF round[msg.dest] # msg.round \/ pending_indirect_ack[msg.dest] = Nil 
               THEN /\ RecordGossipStats(msg.dest, msg.source, msg.gossip, FALSE)
                    /\ UNCHANGED << round, pending_indirect_ack >>
               ELSE /\ IncrementRound(msg.dest)
                    /\ pending_indirect_ack' = [pending_indirect_ack EXCEPT ![msg.dest] = Nil]
                    /\ RecordGossipStats(msg.dest, msg.source, msg.gossip, TRUE) 
            /\ MessageProcessed(msg)
            /\ UNCHANGED << pending_direct_ack, probe_ctr, sim_status, initial_state_stats>>

(************************************************************************) 
(******************** ACTION: ProbeRequestFails *************************)
(************************************************************************)

(* Notes
This is enabled when the chain of up to PeerGroupSize probe requests is broken,
either due to the target peer really being dead, or message loss somewhere.

This is the end of a protocol period of this member, so the round for this member 
is incremented.
*)

IndirectProbeChainIntact(member) ==
    LET round_of_chain == round[member]
    IN
        \E msg \in DOMAIN messages :
           /\ msg.round = round_of_chain
           /\ \/ /\ msg.type = ProbeRequestMessage 
                 /\ msg.source = member
              \/ /\ msg.type = ProbeMessage
                 /\ msg.on_behalf_of = member
              \/ /\ msg.type = AckMessage
                 /\ msg.on_behalf_of = member
              \/ /\ msg.type = ForwardedAckMessage
                 /\ msg.dest = member
           /\ messages[msg] > 0

NoMoreProbeRequestsToSend(member) == 
    LET failed_probe == CHOOSE msg \in DOMAIN messages :
                            /\ msg.type = ProbeMessage
                            /\ msg.round = round[member]
                            /\ msg.source = member
                            /\ msg.on_behalf_of = Nil
    IN NoEligiblePeersForProbeRequest(failed_probe)

NoResponseToProbeRequests ==
    /\ ~StopConditionReached
    /\ \E member \in Member :
        /\ pending_indirect_ack[member] # Nil
        /\ ProbeRequestsSent(member)
        /\ NoMoreProbeRequestsToSend(member)
        /\ ~IndirectProbeChainIntact(member)
        /\ LET pr == CHOOSE msg \in DOMAIN messages :
                        /\ msg.type = ProbeRequestMessage
                        /\ msg.source = member
                        /\ msg.round = round[member]
           IN
                /\ IF peer_states[member][pr.target].state = AliveState
                   THEN UpdateState(member, pr.target, SuspectState)
                   ELSE UNCHANGED << peer_states >>
                /\ pending_indirect_ack' = [pending_indirect_ack EXCEPT ![member] = Nil]
                /\ IncrementRound(member)
                /\ RecordStateStats
                /\ UNCHANGED <<incarnation, messages, pending_direct_ack, probe_ctr, sim_status, initial_state_stats>>

(************************************************************************) 
(******************** ACTION: MemberJoins *******************************)
(************************************************************************)

(* Notes
A member that joins does not have a Nil incarnation prior to joining. A
member that is yet to join is alive but knows of no other peers.
Joining is the act of informing it of the existence of InitialContactMemberCount peers
that it can start probing.
*)

MemberJoins ==
    \E member \in Member :
        /\ incarnation[member] # Nil
        /\ \A m \in Member : \/ member = m
                             \/ peer_states[member][m].state = UnknownState
        /\ \E contact_peers \in kSubset(InitialContacts, Member): \* SUBSET Member : /\ Cardinality(contact_peers) = InitialContacts
            /\ \A p \in contact_peers : incarnation[p] # Nil
            /\ peer_states' = [peer_states EXCEPT ![member] =
                                    [m \in Member |-> IF m \in contact_peers
                                                      THEN KnownAliveStateRecord
                                                      ELSE UnknownStateRecord]]
            /\ UNCHANGED <<incarnation, messages, pending_direct_ack, pending_indirect_ack, 
                           probe_ctr, round, sim_status, initial_state_stats >>

(************************************************************************) 
(******************** ACTION: MemberLeaves *******************************)
(************************************************************************)

(* Notes
Only enabled if the constant AllowMembersToLeave is set to TRUE.
Simply changes its incarnation to Nil which represents a dead member.
Once left, a member cannot rejoin. SWIM makes no provision for rejoining members,
it would need to join with a different identifier which is not modelled in this spec.
There must be at least two live members for a member to leave, we do not model all members leaving.
Any messages sent to the leaving member, but not processed are lost.
*)
MemberLeaves ==
    /\ MemberLeavesEnabled
    /\ \E m1, m2 \in Member :
        /\ m1 # m2
        /\ incarnation[m1] # Nil
        /\ incarnation[m2] # Nil
        /\ incarnation' = [incarnation EXCEPT ![m1] = Nil]
        /\ messages' = [msg \in DOMAIN messages |-> IF msg.dest = m1 THEN 0 ELSE messages[msg]]
        /\ UNCHANGED <<peer_states, pending_direct_ack, pending_indirect_ack, 
                       probe_ctr, round, sim_status, initial_state_stats >>

(************************************************************************) 
(******************** Init, Next state and spec *************************)
(************************************************************************)

\* We can configure the initial state to start with a certain number of undetected states
\* such as undetected dead members or undetected newly joined members.
\* We can also configure some members to not have joined yet
\* The rest will be started, and already know about each other and their peer state they have not being infective.
\* This all allows us to test various scenarios easily
Init ==
    LET undetected_dead == CHOOSE s \in kSubset(DeadMemberCount, Member): TRUE
        undetected_joined == CHOOSE s \in kSubset(NewMemberCount, Member \ undetected_dead): TRUE
        contact_members == CHOOSE s \in kSubset(InitialContacts, Member \ (undetected_dead \union undetected_joined)): TRUE
        unjoined == CHOOSE s \in kSubset(UnjoinedMemberCount, Member \ (undetected_dead \union undetected_joined \union contact_members)) : TRUE
    IN
                /\ incarnation = [m \in Member |-> IF m \in undetected_dead THEN Nil ELSE 1]
                /\ peer_states = [m \in Member |-> [m1 \in Member |-> 
                                                        CASE m \in undetected_joined -> IF m1 \in contact_members
                                                                                        THEN KnownAliveStateRecord
                                                                                        ELSE UnknownStateRecord
                                                          [] m \in undetected_dead   -> UnknownStateRecord
                                                          [] m \in unjoined          -> UnknownStateRecord
                                                          [] OTHER                   -> IF m1 \in undetected_joined \/ m1 \in unjoined
                                                                                        THEN UnknownStateRecord
                                                                                        ELSE KnownAliveStateRecord]]
                /\ round = [m \in Member |-> 1]
                /\ messages = [msg \in {} |-> 0]
                /\ pending_direct_ack = [m \in Member |-> Nil]
                /\ pending_indirect_ack = [m \in Member |-> Nil]
                /\ probe_ctr = [m \in Member |-> [m1 \in Member |-> 0]]
                /\ sim_status = 0
                /\ initial_state_stats = [alive_states  |-> StateCount(AliveState, peer_states),
                                          alive_members |-> Cardinality(Member) - Cardinality(unjoined)]
                /\ ResetStats(1000)

Next ==
    \/ \E member, peer \in Member : 
        \/ SendDirectProbe(member, peer)
        \/ Expire(member, peer)
    \/ ReceiveProbe
    \/ ReceiveAck
    \/ SendProbeRequest
    \/ NoPeersForProbeRequest
    \/ ReceiveProbeRequest
    \/ ReceiveProbeRequestAck
    \/ ReceiveForwardedAck
    \/ NoResponseToProbeRequests
    \/ MemberJoins
    \/ MemberLeaves
    \/ MessageIgnored
    \/ EndSim

MemberOrNil ==
    Member \union {Nil}

States ==
    { AliveState, SuspectState, DeadState, UnknownState }

TypeOK ==
    /\ incarnation \in [Member -> Nat \union {Nil}]
    /\ peer_states \in [Member -> [Member -> [incarnation    : Nat, 
                                              state          : States, 
                                              round          : Nat, 
                                              disseminations : 0..DisseminationLimit]]]
    /\ round \in [Member -> Nat]
    /\ pending_direct_ack \in [Member -> MemberOrNil]
    /\ pending_indirect_ack \in [Member -> MemberOrNil]
    /\ probe_ctr \in [Member -> [Member -> Nat]]
    /\ sim_status \in Nat

Inv ==
    TRUE
    (*TLCGet("level") < 20*)
    (*IF (~ ENABLED Next) THEN
        IF sim_status = 1 THEN TRUE
        ELSE Print(<<"C", IsConverged(incarnation, peer_states, Nil, DeadState, AliveState)>>, FALSE)
    ELSE TRUE*)

Liveness ==
    WF_vars(Next)

Spec == Init /\ [][Next]_vars 

TestEnsemble ==
    1..10

============================================================================
\* Modification History
\* Last modified Wed Oct 14 14:37:22 PDT 2020 by markus
\* Last modified Thu Sep 17 02:05:07 PDT 2020 by jack
\* Last modified Thu Oct 18 12:45:40 PDT 2018 by jordanhalterman
\* Created Mon Oct 08 00:36:03 PDT 2018 by jordanhalterman
CONSTANTS       Member <- TestEnsemble
                Nil = 0
                AliveState = 4
                SuspectState = 3
                DeadState = 2
                UnknownState = 1
                GossipMessage = GossipMessage
                ProbeMessage = ProbeMessage
                AckMessage = AckMessage
                ProbeRequestMessage = ProbeRequestMessage
                ForwardedAckMessage = ForwardedAckMessage
                PeerGroupSize = 1
                SuspectTimeout = 5
                DisseminationLimit = 5
                MaxUpdatesPerPiggyBack = 6
                MessageLossMode = "probabilistic"
                LoseEveryNth = 5
                MemberLeavesEnabled = FALSE
                InitialContacts = 1
                DeadMemberCount = 0
                NewMemberCount = 0
                UnjoinedMemberCount = 0
                ForceRunToRound = 40
                PrintStatsOnDeadlock = TRUE
SPECIFICATION   Spec

Also see https://github.com/Vanlightly/formal-methods-playground/tree/master/tla/swim

konnov commented 3 years ago

Hi Markus!

Actually, I cannot parse the pasted file with SANY 1.7.0:

$ java -jar ~/.m2/repository/org/lamport/tla2tools/1.7.0-SNAPSHOT/tla2tools-1.7.0-SNAPSHOT.jar -config swim_stats.cfg swim_stats2.tla
TLC2 Version 2.15 of Day Month 20?? (rev: eb3ff99)
Running breadth-first search Model-Checking with fp 87 and seed 8212768720453189801 with 1 worker on 8 cores with 3641MB heap and 64MB offheap memory (Mac OS X 10.15.7 x86_64, Oracle Corporation 1.8.0_172 x86_64, MSBDiskFPSet, DiskStateQueue).
Parsing file /Users/igor/Downloads/swim_stats2.tla
Parsing file /private/var/folders/42/77j16l715bd5k68fffptzgvw0000gn/T/Naturals.tla
Parsing file /private/var/folders/42/77j16l715bd5k68fffptzgvw0000gn/T/FiniteSets.tla
Parsing file /Users/igor/Downloads/FiniteSetsExt.tla

Fatal errors while parsing TLA+ spec in file swim_stats2

java.lang.NullPointerException
Starting... (2020-10-16 08:58:19)
Error: Parsing or semantic analysis failed. Module-Table lookup failure for module name swim_stats2 derived from swim_stats2 file name.
Finished in 00s at (2020-10-16 08:58:19)

We just recently switched to 1.7.0, so you most likely used apalache with tla2tools 1.6.0. I suspect that the problems that we encounter both in Apalache and 1.7.0 are of similar nature.

lemmy commented 3 years ago

Hi Igor,

can you pull FiniteSetsExt and the other modules in the EXTENDS from the CommunityModules?

konnov commented 3 years ago

Now I have the reported behavior. Thanks!

konnov commented 3 years ago

Our importer fails at the following expression (line 575):

        peer_states' = [peer_states EXCEPT ![member][peer] =
                            [@ EXCEPT !.state          = new_state,             
                                      !.disseminations = 0,
                                      !.round          = round[member]]]

It looks like the problem is in using @ inside the nested EXCEPT expression. The notation @ works on simple expressions. I will have another look later.

konnov commented 3 years ago

I am staring at a MWE for this bug:

---- MODULE Updates ----                                                        
VARIABLE f
E4 == [ f EXCEPT ![0] = [@ EXCEPT !.state = 4] ]
================================

@lemmy, the node explorer is showing me quite a strange structure (see Updates.pdf). Could you clarify how to interpret that structure at the level of SemanticNode?

Updates.pdf

lemmy commented 3 years ago

Do line numbers (-Dtla2sany.explorer.DotExplorerVisitor.includeLineNumbers=true) and each node's type help to answer your questions?

Updates

strict digraph DiskGraph {
1146 [label="Updates",style = filled];
1209 [shape=square,color="yellow",label="f
line 2, col 10 to line 2, col 10 
 Updates
OpDeclNode"];
1146 -> 1209
1283 [style=filled,shape=diamond,fillcolor="red",label="E4
line 3, col 1 to line 3, col 48 
 Updates
OpDefNode"];
1146 -> 1283
1397 [color="green",label="[fEXCEPT![0]=[@EXCEPT!.state=4]]
line 3, col 7 to line 3, col 48 
 Updates
OpApplNode"];
1283 -> 1397
1396 [color="green",label="f
line 3, col 9 to line 3, col 9 
 Updates
OpApplNode"];
1397 -> 1396
1209 [shape=square,color="yellow",label="f
line 2, col 10 to line 2, col 10 
 Updates
OpDeclNode"];
1396 -> 1209
1400 [color="green",label="![0]=[@EXCEPT!.state=4]
line 3, col 18 to line 3, col 46 
 Updates
OpApplNode"];
1397 -> 1400
1399 [color="green",label="![0]=[@EXCEPT!.state=4]
line 3, col 18 to line 3, col 46 
 Updates
OpApplNode"];
1400 -> 1399
1615 [label="0
line 3, col 20 to line 3, col 20 
 Updates
NumeralNode"];
1399 -> 1615
1402 [color="green",label="[@EXCEPT!.state=4]
line 3, col 25 to line 3, col 46 
 Updates
OpApplNode"];
1400 -> 1402
1405 [color="green",label="!.state=4
line 3, col 35 to line 3, col 45 
 Updates
OpApplNode"];
1402 -> 1405
1404 [color="green",label="!.state=4
line 3, col 35 to line 3, col 45 
 Updates
OpApplNode"];
1405 -> 1404
1682 [label="state
line 3, col 37 to line 3, col 41 
 Updates
StringNode"];
1404 -> 1682
1623 [label="4
line 3, col 45 to line 3, col 45 
 Updates
NumeralNode"];
1405 -> 1623
}
konnov commented 3 years ago

That makes sense. Thanks!

konnov commented 3 years ago

What is strange here. The debugger shows AtNode as the first argument to the $Except node (when using OpApplNode.getArgs). But it is not shown in the graph. I am trying to figure out exactly how the expression [@ EXCEPT !.state = 4] is translated.

lemmy commented 3 years ago

This is an issue of DotExplorerVisitor. AtNode doesn't show up because AtNode#walkGraph is a no-op.

https://github.com/tlaplus/tlaplus/blob/0d868ad902d183ae837784075d63475035efb118/tlatools/org.lamport.tlatools/src/tla2sany/semantic/AtNode.java#L154-L158

You can use the interactive Explorer java -cp tla2tools.jar tla2sany.SANY -d Updates.tla:


****** SANY2 Version 2.2 created 08 July 2020

Parsing file /tmp/Updates.tla
Semantic processing of module Updates

*** TLA+ semantic graph exploration tool v 1.0 (DRJ)

>>mt

External Module Table:
| Module: *ModuleNode: Updates    uid: 154  kind: ModuleKind  errors: none
| | Context Entry: f  155 
| | Context Entry: E4  167 
>>167

*OpDefNode: E4
    uid: 167  kind: UserDefinedOpKind  arity: 0  orgDefInModule: Updates
  local: false
  letInLevel: 0
  inRecursive: false
  inRecursiveSection: false
  recursiveSection: -1
  local: false
  source: this
  originallyDefinedInModule: Updates (uid: 154)
  Formal params: 0
  SymbolTable: non-null
| Body:
| | *OpApplNode: $Except    uid: 157  kind: OpApplKind  errors: non-null
| | | Operator: $Except  85  
| | | Operands: 2
| | | | *OpApplNode: f    uid: 156  kind: OpApplKind  errors: non-null
| | | | | Operator: f  155  
| | | | *OpApplNode: $Pair    uid: 160  kind: OpApplKind  errors: non-null
| | | | | Operator: $Pair  96  
| | | | | Operands: 2
| | | | | | *OpApplNode: $Seq    uid: 159  kind: OpApplKind  errors: non-null
| | | | | | *OpApplNode: $Except    uid: 162  kind: OpApplKind  errors: non-null
  Labels: null

Arity: 0
Level: 1
LevelParams: [line 2, col 10 to line 2, col 10 of module Updates]
LevelConstraints: { f -> 2}
ArgLevelConstraints: { }
ArgLevelParams: {}
MaxLevel: 
opLevelCond: []
AllParams: {f}
NonLeibnizParams: {}
IsLeibniz: true
isLeibnizArg: 

>>157

*OpApplNode: $Except    uid: 157  kind: OpApplKind  errors: non-null
| Operator: $Except  85  
| Operands: 2
| | *OpApplNode: f    uid: 156  kind: OpApplKind  errors: non-null
| | | Operator: f  155  
| | *OpApplNode: $Pair    uid: 160  kind: OpApplKind  errors: non-null
| | | Operator: $Pair  96  
| | | Operands: 2
| | | | *OpApplNode: $Seq    uid: 159  kind: OpApplKind  errors: non-null
| | | | | Operator: $Seq  105  
| | | | | Operands: 1
| | | | | | *NumeralNode:   uid: 158  kind: NumeralKind Value: 0; image: 0
| | | | *OpApplNode: $Except    uid: 162  kind: OpApplKind  errors: non-null
| | | | | Operator: $Except  85  
| | | | | Operands: 2
| | | | | | *AtNode:   uid: 161  kind: AtNodeKind
| | | | | | | ExceptRef: 157
| | | | | | | ExceptComponent: 160
| | | | | | *OpApplNode: $Pair    uid: 165  kind: OpApplKind  errors: non-null

Level:1
LevelParams: [line 2, col 10 to line 2, col 10 of module Updates]
LevelConstraints: { f -> 2}
ArgLevelConstraints: { }
ArgLevelParams: {}
AllParams: {f}
NonLeibnizParams: {}
>>
konnov commented 3 years ago

Wow. Those is the details I always wanted to see! Thanks!

lemmy commented 3 years ago

IMO the dot graph is better to understand the hierarchy. What parts of the textual explorer would you want to see in the graph?

lemmy commented 3 years ago

https://github.com/tlaplus/tlaplus/commit/52f3b01c3d863ab0a2404a96997bb2032bcf4203

konnov commented 3 years ago

I expected to see the whole syntax tree, but the printer happens to be selective about what it shows.

konnov commented 3 years ago

In the end, it is a one-line fix, see #302. Though it took me several hours to understand the problem :)

konnov commented 3 years ago

As always with such bugs, another unexpected behavior popped up

konnov commented 3 years ago

PR #305 implements better error reporting for the language predicates. Now apalache complains about TLCSet:

swim_stats.tla:296:12-296:39: unexpected expression: TLC!TLCSet(r$1 * 100, 0) E@12:11:50.192
swim_stats.tla:297:12-297:43: unexpected expression: TLC!TLCSet((r$1 * 100) + 1, 0) E@12:11:50.214
swim_stats.tla:299:20-299:74: unexpected expression: TLC!TLCSet((r$1 * 100) + 6, initial_state_stats'["alive_members"]) E@12:11:50.236
swim_stats.tla:300:20-300:80: unexpected expression: TLC!TLCSet((r$1 * 100) + 7, initial_state_stats'["alive_states"]) E@12:11:50.257
swim_stats.tla:301:20-301:42: unexpected expression: TLC!TLCSet((r$1 * 100) + 6, 0) E@12:11:50.279
swim_stats.tla:302:20-302:49: unexpected expression: TLC!TLCSet((r$1 * 100) + 7, 0) E@12:11:50.301
swim_stats.tla:303:12-303:36: unexpected expression: TLC!TLCSet((r$1 * 100) + 2, 0) E@12:11:50.323
swim_stats.tla:304:12-304:43: unexpected expression: TLC!TLCSet((r$1 * 100) + 3, 0) E@12:11:50.344
swim_stats.tla:305:12-305:33: unexpected expression: TLC!TLCSet((r$1 * 100) + 4, 0) E@12:11:50.365
swim_stats.tla:306:12-306:40: unexpected expression: TLC!TLCSet((r$1 * 100) + 5, 0) E@12:11:50.386

I think we should stop here. The parser goes through. Now the model checker complains about unsupported operators, which we don't know how to support. If you like to make further progress on this spec, let's open another issue. But I would wait till the new type checker is integrated.