Closed mratsim closed 4 years ago
Uh oh, you're right. I guess I didn't think this one through...
To keep the intended functionality, you can check the branch and then traverse the worker tree iteratively.
I've cooked up an algorithm to traverse any subtree of a binary tree in a depth-first way without having to allocate a stack. I'm not sure how to adapt the iterator to C though
import bitops
template isLeaf(node, maxID: int32): bool =
node >= maxID div 2
func lastLeafOfSubTree(start, maxID: int32): int32 =
## Returns the last leaf of a sub tree
# Algorithm:
# we take the right side by doing
# repeated (2n+2) computation until we reach the rightmost leaf node
assert start <= maxID
result = start
while not result.isLeaf(maxID):
result = 2*result + 2
func prevPowerofTwo(n: int32): int32 {.inline.} =
## Returns n if n is a power of 2
## or the biggest power of 2 preceding n
1'i32 shl fastLog2(n+1)
iterator traverseDepthFirst*(start, maxID: int32): int32 =
## CPU and memory efficient depth first iteration of implicit binary trees:
## Stackless and traversal can start from any subtrees
# We use the integer bit representation as an encoding
# to the binary tree path and use shifts and count trailing zero
# to navigate downward and jump back to the parent
assert start in 0 .. maxID
var depthIdx = start.prevPowerofTwo() # Index+1 of the first node of the current depth.
# (2^depth - 1) is the start.
var relPos = start - depthIdx + 1 # Relative position compared to the first node at that depth
# A node has coordinates (Depth, Pos) in the tree
# with Pos: relative position to the start node of that depth
#
# node(Depth, Pos) = 2^Depth + Pos - 1
#
# In the actual code
# depthIdx = 2^Depth and relPos = Pos
preCondition: start == depthIdx + relPos - 1
let lastLeaf = lastLeafOfSubTree(start, maxID)
var node: int32
while true:
node = depthIdx + relPos - 1
yield node
if node == lastLeaf:
break
if node.isLeaf(maxId):
relPos += 1
let jump = countTrailingZeroBits(relPos)
depthIdx = depthIdx shr jump
relPos = relPos shr jump
else:
depthIdx = depthIdx shl 1
relPos = relPos shl 1
# Sanity check
# ----------------------------------------------------------------------------------
# depth 0 ------ 0 ------
# depth 1 -- 1 -- --- 2 ---
# depth 2 3 4 5 6
# depth 3 7 8 9 10 11 12 13 14
#
# In an array storage is linear
#
# 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
when isMainModule:
let from0 = [ 0, 1, 3, 7, 8, 4, 9, 10, 2, 5, 11, 12, 6, 13, 14]
let from1 = [ 1, 3, 7, 8, 4, 9, 10]
let from2 = [ 2, 5, 11, 12, 6, 13, 14]
let from3 = [ 3, 7, 8]
let from4 = [ 4, 9, 10]
let from5 = [ 5, 11, 12]
let from6 = [ 6, 13, 14]
let from7 = [ 7]
let from8 = [ 8]
let from9 = [ 9]
let from10 = [10]
let from11 = [11]
let from12 = [12]
let from13 = [13]
let from14 = [14]
template check(start: int, target: untyped) =
var pos = 0
for i in traverseDepthFirst(start, maxID = 14):
doAssert: target[pos] == i
pos += 1
doAssert pos == target.len
check(0, from0)
check(1, from1)
check(2, from2)
check(3, from3)
check(4, from4)
check(5, from5)
check(6, from6)
check(7, from7)
check(8, from8)
check(9, from9)
check(10,from10)
check(11,from11)
check(12,from12)
check(13,from13)
check(14,from14)
I think the ideal would be breadth-first, i.e. checking all parents before children, also the storage of workers is breadth-first 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
so it would help the CPU branch predictor, but I'm not too sure yet on how to do breadth-first subtree traversal without also needing to allocate extra space.
Well that algo has a bug.
I was hunting my worker 0 never terminating and found out that
for i in traverseDepthFirst(0, maxID = 3):
echo i
outputs:
0
1
2
and
for i in traverseDepthFirst(0, maxID = 4):
echo i
0
1
3
4
2
The second case is OK, but in the first case, if the steal requests from 0 is stuck in worker 3, it never receives it's own steal request and keeps spinning.
I expect you will have similar problem due to your own bug where a worker can have its request stuck in a child queue but it does not check it.
2 fixes, testing leaves was wrong:
template isLeaf(node, maxID: int32): bool =
2*node + 1 > maxID
And you also need to skip if a node is bigger than maxID
iterator traverseDepthFirst*(start, maxID: int32): MetaNode =
## CPU and memory efficient depth first iteration of implicit binary trees:
## Stackless and traversal can start from any subtrees
# We use the integer bit representation as an encoding
# to the binary tree path and use shifts and count trailing zero
# to navigate downward and jump back to the parent
assert: start in 0 .. maxID
var depthStart = start.prevPowerofTwo() # Index+1 of the first node of the current depth.
# (2^depth - 1) is the index of starter node at each depth.
var relPos = start - depthStart + 1 # Relative position compared to the first node at that depth.
# A node has coordinates (Depth, Pos) in the tree
# with Pos: relative position to the start node of that depth
#
# node(Depth, Pos) = 2^Depth + Pos - 1
#
# In the actual code
# depthStart = 2^Depth and relPos = Pos
assert: start == depthStart + relPos - 1
let lastLeaf = lastLeafOfSubTree(start, maxID)
var node: int32
while true:
node = depthStart + relPos - 1
if node <= maxID:
yield (node, (depthStart, relPos, node.isLeaf(maxID)))
if node == lastLeaf:
break
if node.isLeaf(maxId):
relPos += 1
let jump = countTrailingZeroBits(relPos)
depthStart = depthStart shr jump
relPos = relPos shr jump
else:
depthStart = depthStart shl 1
relPos = relPos shl 1
Otherwise with maxID = 3 we still get node 4 before the iterator goes to 2 and with maxID = 7 we get 8.
I should really see if we can have something breadth first.
Turns out it was much easier to do breadth first from any subtree without needing a queue:
iterator traverseBreadthFirst*(start, maxID: int32): int32 =
assert start in 0 .. maxID
var
levelStart = start # Index of the node starting the current depth
levelEnd = start # Index of the node ending the current depth
pos = 0'i32 # Relative position compared to the current depth
var node: int32
while true:
node = levelStart + pos
if node > maxID:
break
yield node
if node == levelEnd:
levelStart = 2*levelStart + 1
levelEnd = 2*levelEnd + 2
pos = 0
else:
pos += 1
Nice! I came up with this:
def left_child(node):
n = 2*node + 1
return n if 0 <= n < max_nodes else -1
def check_subtree(root):
return check_level(root, 0, [])
def check_level(node, lvl, lst):
if node == -1:
return lst
for n in range(node, min(node + 2**lvl, max_nodes)):
lst.append(n)
return check_level(left_child(node), lvl+1, lst)
I just love (tail) recursion. :wink:
It seems to me like the backoff strategies have a bug when peeking, counting or receiving recursively from their child steal request channels:
https://github.com/aprell/tasking-2.0/blob/303ce3e09a7c118241b9d5482219e3e3b848212e/src/runtime.c#L541-L630
If we focus on peeking
with 14 workers:
We assume the branch from worker 4-9-10 is idle and every other branch is working, especially branch 2. Worker 1 wants to check its and its children steal requests:
tree.left_subtree_is_idle
, with tree being thread-local to 1, so what is actually checked is worker 3 state which is active and worker 1 fails to retrieves worker 9 pending requests.