aprell / tasking-2.0

Tasking 2.0
16 stars 0 forks source link

Backoff: the "on behalf" functions seem to miss half of the worker tree. #2

Closed mratsim closed 4 years ago

mratsim commented 4 years ago

It seems to me like the backoff strategies have a bug when peeking, counting or receiving recursively from their child steal request channels:

https://github.com/aprell/tasking-2.0/blob/303ce3e09a7c118241b9d5482219e3e3b848212e/src/runtime.c#L541-L630

If we focus on peeking

 static inline bool PEEK_REQ(int n) 
 // requires -1 <= n < num_workers 
 { 
    bool ret = false; 

    // Valid worker ID? 
    if (n == -1 || n >= num_workers) return ret; 

    // Peek at steal requests on behalf of worker n 
    ret = channel_peek(chan_requests[n]); 

 #if BACKOFF == sleep_exp || BACKOFF == wait_cond 
    if (!ret && tree.left_subtree_is_idle) { 
        ret = PEEK_REQ(left_child(n, num_workers-1)); 
    } 

    if (!ret && tree.right_subtree_is_idle) { 
        ret = PEEK_REQ(right_child(n, num_workers-1)); 
    } 
 #endif 

    return ret; 
 } 

with 14 workers: image

We assume the branch from worker 4-9-10 is idle and every other branch is working, especially branch 2. Worker 1 wants to check its and its children steal requests:

aprell commented 4 years ago

Uh oh, you're right. I guess I didn't think this one through...

mratsim commented 4 years ago

To keep the intended functionality, you can check the branch and then traverse the worker tree iteratively.

I've cooked up an algorithm to traverse any subtree of a binary tree in a depth-first way without having to allocate a stack. I'm not sure how to adapt the iterator to C though

import bitops

template isLeaf(node, maxID: int32): bool =
  node >= maxID div 2

func lastLeafOfSubTree(start, maxID: int32): int32 =
  ## Returns the last leaf of a sub tree
  # Algorithm:
  # we take the right side by doing
  # repeated (2n+2) computation until we reach the rightmost leaf node
  assert start <= maxID
  result = start
  while not result.isLeaf(maxID):
    result = 2*result + 2

func prevPowerofTwo(n: int32): int32 {.inline.} =
  ## Returns n if n is a power of 2
  ## or the biggest power of 2 preceding n
  1'i32 shl fastLog2(n+1)

iterator traverseDepthFirst*(start, maxID: int32): int32 =
  ## CPU and memory efficient depth first iteration of implicit binary trees:
  ## Stackless and traversal can start from any subtrees

  # We use the integer bit representation as an encoding
  # to the binary tree path and use shifts and count trailing zero
  # to navigate downward and jump back to the parent

  assert start in 0 .. maxID

  var depthIdx = start.prevPowerofTwo() # Index+1 of the first node of the current depth.
                                        # (2^depth - 1) is the start.
  var relPos = start - depthIdx + 1     # Relative position compared to the first node at that depth

  # A node has coordinates (Depth, Pos) in the tree
  # with Pos: relative position to the start node of that depth
  #
  # node(Depth, Pos) = 2^Depth + Pos - 1
  #
  # In the actual code
  # depthIdx = 2^Depth and relPos = Pos

  preCondition: start == depthIdx + relPos - 1

  let lastLeaf = lastLeafOfSubTree(start, maxID)
  var node: int32

  while true:
    node = depthIdx + relPos - 1
    yield node
    if node == lastLeaf:
      break
    if node.isLeaf(maxId):
      relPos += 1
      let jump = countTrailingZeroBits(relPos)
      depthIdx = depthIdx shr jump
      relPos = relPos shr jump
    else:
      depthIdx = depthIdx shl 1
      relPos = relPos shl 1

# Sanity check
# ----------------------------------------------------------------------------------

#  depth 0          ------ 0 ------
#  depth 1      -- 1 --       --- 2 ---
#  depth 2      3     4       5       6
#  depth 3    7  8  9  10  11  12  13  14
#
# In an array storage is linear
#
# 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

when isMainModule:
  let from0  = [ 0,  1,  3,  7,  8,  4,  9, 10, 2, 5, 11, 12, 6, 13, 14]
  let from1  = [ 1,  3,  7,  8,  4,  9, 10]
  let from2  = [ 2,  5, 11, 12,  6, 13, 14]
  let from3  = [ 3,  7,  8]
  let from4  = [ 4,  9, 10]
  let from5  = [ 5, 11, 12]
  let from6  = [ 6, 13, 14]
  let from7  = [ 7]
  let from8  = [ 8]
  let from9  = [ 9]
  let from10 = [10]
  let from11 = [11]
  let from12 = [12]
  let from13 = [13]
  let from14 = [14]

  template check(start: int, target: untyped) =
    var pos = 0
    for i in traverseDepthFirst(start, maxID = 14):
      doAssert: target[pos] == i
      pos += 1

    doAssert pos == target.len

  check(0, from0)
  check(1, from1)
  check(2, from2)
  check(3, from3)
  check(4, from4)
  check(5, from5)
  check(6, from6)
  check(7, from7)
  check(8, from8)
  check(9, from9)
  check(10,from10)
  check(11,from11)
  check(12,from12)
  check(13,from13)
  check(14,from14)

I think the ideal would be breadth-first, i.e. checking all parents before children, also the storage of workers is breadth-first 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 so it would help the CPU branch predictor, but I'm not too sure yet on how to do breadth-first subtree traversal without also needing to allocate extra space.

mratsim commented 4 years ago

Well that algo has a bug.

I was hunting my worker 0 never terminating and found out that

for i in traverseDepthFirst(0, maxID = 3):
  echo i

outputs:

0
1
2

and

for i in traverseDepthFirst(0, maxID = 4):
  echo i
0
1
3
4
2

The second case is OK, but in the first case, if the steal requests from 0 is stuck in worker 3, it never receives it's own steal request and keeps spinning.

I expect you will have similar problem due to your own bug where a worker can have its request stuck in a child queue but it does not check it.

mratsim commented 4 years ago

2 fixes, testing leaves was wrong:

template isLeaf(node, maxID: int32): bool =
  2*node + 1 > maxID

And you also need to skip if a node is bigger than maxID

iterator traverseDepthFirst*(start, maxID: int32): MetaNode =
  ## CPU and memory efficient depth first iteration of implicit binary trees:
  ## Stackless and traversal can start from any subtrees

  # We use the integer bit representation as an encoding
  # to the binary tree path and use shifts and count trailing zero
  # to navigate downward and jump back to the parent

  assert: start in 0 .. maxID

  var depthStart = start.prevPowerofTwo() # Index+1 of the first node of the current depth.
                                          # (2^depth - 1) is the index of starter node at each depth.
  var relPos = start - depthStart + 1     # Relative position compared to the first node at that depth.

  # A node has coordinates (Depth, Pos) in the tree
  # with Pos: relative position to the start node of that depth
  #
  # node(Depth, Pos) = 2^Depth + Pos - 1
  #
  # In the actual code
  # depthStart = 2^Depth and relPos = Pos

  assert: start == depthStart + relPos - 1

  let lastLeaf = lastLeafOfSubTree(start, maxID)
  var node: int32

  while true:
    node = depthStart + relPos - 1
    if node <= maxID:
      yield (node, (depthStart, relPos, node.isLeaf(maxID)))
    if node == lastLeaf:
      break
    if node.isLeaf(maxId):
      relPos += 1
      let jump = countTrailingZeroBits(relPos)
      depthStart = depthStart shr jump
      relPos = relPos shr jump
    else:
      depthStart = depthStart shl 1
      relPos = relPos shl 1

Otherwise with maxID = 3 we still get node 4 before the iterator goes to 2 and with maxID = 7 we get 8.

I should really see if we can have something breadth first.

mratsim commented 4 years ago

Turns out it was much easier to do breadth first from any subtree without needing a queue:

iterator traverseBreadthFirst*(start, maxID: int32): int32 =

  assert start in 0 .. maxID

  var
    levelStart = start # Index of the node starting the current depth
    levelEnd = start   # Index of the node ending the current depth
    pos = 0'i32        # Relative position compared to the current depth

  var node: int32

  while true:
    node = levelStart + pos
    if node > maxID:
      break
    yield node
    if node == levelEnd:
      levelStart = 2*levelStart + 1
      levelEnd = 2*levelEnd + 2
      pos = 0
    else:
      pos += 1
aprell commented 4 years ago

Nice! I came up with this:

def left_child(node):
    n = 2*node + 1
    return n if 0 <= n < max_nodes else -1

def check_subtree(root):
    return check_level(root, 0, [])

def check_level(node, lvl, lst):
    if node == -1:
        return lst
    for n in range(node, min(node + 2**lvl, max_nodes)):
        lst.append(n)
    return check_level(left_child(node), lvl+1, lst)

I just love (tail) recursion. :wink: