bbengfort / cloudscope

Simulation and visualization of distributed systems and communications.
http://bbengfort.github.io/cloudscope/
MIT License
1 stars 0 forks source link

duplicate accesses detected #63

Closed bbengfort closed 7 years ago

bbengfort commented 8 years ago

In the tag integration tests, there is a duplicate accesses detected bug, and occasionally a log index out of bounds error -- these are presumably related.

bbengfort commented 8 years ago

With the following configuration:

debug:      true
testing:    false

simulation:

    # Simulation Environment Parameters
    random_seed:  42
    max_sim_time: 4320000

    # Network Parameters
    count_messages: true
    aggregate_heartbeats: true
    default_latency: 800
    default_replica: storage
    default_consistency: strong

    # Workload Parameters
    users: 3                  # number of simulated users creating traces
    max_objects_accessed: 10  # maximum number of objects that can be accessed
    synchronous_access: false # each access has to wait on the previous access to be triggered
    valid_locations:          # locations to allow user to move to.
        - home
        - work
        - mobile
        - cloud

    invalid_types:        # replica types that shouldn't have accesses.
        # - storage
        - backup

    move_prob: 0.2        # probability of moving locations
    switch_prob: 0.6      # probability of switching devices
    object_prob: 0.3      # probability of switching the currently accessed object
    access_mean: 1600     # mean delay (milliseconds) between accesses
    access_stddev: 512    # standard deviation of delay (milliseconds) between accesses
    read_prob: 0.58       # probability of read access; write probability is 1 - read_prob

    # Eventual Parameters
    anti_entropy_delay: 10000         # delay in milliseconds (20x per minute)
    do_gossip: true                 # perform gossip protocol
    do_rumoring: false              # perform rumor mongering

    # Raft Parameters
    election_timeout: [300, 600]  # Range to randomly select the election timeout
    heartbeat_interval: 150        # Usually half the minimum election timeout
    aggregate_writes: true          # Don't send writes until heartbeat.

    # Tag Parameters
    session_timeout: 40960          # Related to the mean delay between accesses

We instead get missing results fields as follows:

======================================================================
FAIL: Run the raft consensus simulation without errors
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/benjamin/Repos/umd/cloudscope/tests/test_simulation/test_main.py", line 150, in test_raft_simulation
    self.assertReliableResults(results)
  File "/Users/benjamin/Repos/umd/cloudscope/tests/test_simulation/test_main.py", line 107, in assertReliableResults
    self.assertIn(metric, results['results'], "Missing '{}' metric from results".format(metric))
AssertionError: Missing 'visibility latency' metric from results

======================================================================
FAIL: Run the tag consensus simulation without errors
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/benjamin/Repos/umd/cloudscope/tests/test_simulation/test_main.py", line 171, in test_tag_simulation
    self.assertReliableResults(results, metrics=tag_metrics)
  File "/Users/benjamin/Repos/umd/cloudscope/tests/test_simulation/test_main.py", line 107, in assertReliableResults
    self.assertIn(metric, results['results'], "Missing '{}' metric from results".format(metric))
AssertionError: Missing 'session length' metric from results

----------------------------------------------------------------------
bbengfort commented 8 years ago

With the default configuration we're getting the following errors:

======================================================================
ERROR: Run the tag consensus simulation without errors
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/benjamin/Repos/umd/cloudscope/tests/test_simulation/test_main.py", line 159, in test_tag_simulation
    sim.run()
  File "/Users/benjamin/Repos/umd/cloudscope/cloudscope/simulation/base.py", line 147, in run
    self.env.run(until=self.max_sim_time)
  File "/Users/benjamin/.virtualenvs/cloudscope/lib/python2.7/site-packages/simpy/core.py", line 137, in run
    self.step()
  File "/Users/benjamin/.virtualenvs/cloudscope/lib/python2.7/site-packages/simpy/core.py", line 221, in step
    callback(event)
  File "/Users/benjamin/Repos/umd/cloudscope/cloudscope/replica/base.py", line 220, in recv
    return self.dispatch(message)
  File "/Users/benjamin/Repos/umd/cloudscope/cloudscope/replica/base.py", line 313, in dispatch
    return handler(message)
  File "/Users/benjamin/Repos/umd/cloudscope/cloudscope/replica/consensus/tag.py", line 652, in on_append_entries_rpc
    "{} is possibly receiving duplicate append entries".format(self)
TagRPCException: Home Desktop (r0) is possibly receiving duplicate append entries

======================================================================
FAIL: Run the raft consensus simulation without errors
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/benjamin/Repos/umd/cloudscope/tests/test_simulation/test_main.py", line 150, in test_raft_simulation
    self.assertReliableResults(results)
  File "/Users/benjamin/Repos/umd/cloudscope/tests/test_simulation/test_main.py", line 107, in assertReliableResults
    self.assertIn(metric, results['results'], "Missing '{}' metric from results".format(metric))
AssertionError: Missing 'visibility latency' metric from results

----------------------------------------------------------------------
bbengfort commented 8 years ago

Duplicate AppendEntries occurs when a heartbeat message arrives out of order from the previous heartbeat message - kind of a problem.

bbengfort commented 7 years ago

Fixed - ended up being related to the raft append entries thing.