AtticusKuhn / parallel-algorithms

Proving the correctness and performance of certain parallel algorithms
4 stars 0 forks source link

Latency tracking specification #1

Open conal opened 1 year ago

conal commented 1 year ago

What is a precise and compelling specification for parallel computations that track latency/depth? We need such a specification in order to understand what it would mean for a latency tracking implementation to be correct.

AtticusKuhn commented 1 year ago

That is a very good point.

This is just my initial ideas on the matter, I haven't researched the literature on this (perhaps I should).

Define an information dependency graph of an algorithm as a directed graph where each node is an event in the algorithm and draw an edge from A to B if B requires information from A to compute.

I have drawn an example of this as a mermaid diagram

There are two mathematical ways I can think of for modelling this graph: as a bounded semi-lattice or as a category.

If we model this as a bounded semi-lattice, the meet of A and B is the latest event which gives information to both A and B. The join of A and B is the earliest event which depends on both A and B. The start of the algorithm is the bottom (no information) and the end of the algorithm is the top (all the information). The algorithm, in this case, is a monotonic increase in information.

If we model this as a category, the objects are events and the arrows are information dependencies between the events. The start of the algorithm is the initial object, and the end of the algorithm is the terminal object.

Now we define a partition on the graph. A partition is a set of sets of events, where each event belongs to exactly one subset. If A -> B, then A and B cannot be in the same subset. My intuition for this is that the partition is an equivalence relation where two events are equivalent if they can be computed at the same time. I.e. A ~ B if A does not depend on B and B does not depend on A.

In my example, the sequential partition would be {{A}, {B}, {C}, {D}} and the parallel partition would be {{A}, {B,C}, {D}}.

In this model, every execution strategy is a function from the information dependency graph to a partition of its events.

We define the cost of an execution strategy as the cardinality of the partition. In my example, the cost of the sequential execution strategy would be 4 and the cost of the parallel execution strategy would be 3.

This is just my initial thoughts on how to define latency.

AtticusKuhn commented 1 year ago

Perhaps, I should note that such a parallel partition is not unique,

Consider this example

We could have the parallel partition {{A}, {B,C}, {D,E}, {F}} or we could have the partition {{A}, {C}, {B,D,E}, {F}}. Either way, both partitions have cardinality 4.

conal commented 1 year ago

(and thus correctness theorems)> This is just my initial ideas on the matter,

I like these ideas, and I think you’ll learn a lot from formalizing your specifications and proving implementation correctness. I’ll encourage you to look for opportunities to reduce complexity and increase naturalness of specifications—both being crucial to scalably correct engineering.

AtticusKuhn commented 1 year ago

I agree with you that my previous definition of runtime was insufficient, so I have been thinking more about the denotationally correct and most elegant way of defining runtime.

I believe this definition is much more elegant. I am inspired by math. To me, if it is not inspired by math, then it is not elegant and therefore wrong. I was inspired by two concepts

  1. metric spaces
  2. the derivative.

Here is my new definition. I will be writing in pseudo-Agda to get the ideas across. Define a cost function as any function with type signature

cost : (A ⇨ B) → A → ℝ

that satisfies the following axioms

-- identity
cost id x = 0
-- triangle inequality
cost (f ∘ g)  x ≤ cost f (g x) + cost g x

a cost function is general enough to track memory cost or time cost, although I will mostly be focusing on time cost. Define the parallel cost as any cost function satisfying some additional axioms

parallelCost : (A ⇨ B) → A → ℝ
parallelCost id x = 0
parallelCost (f ∘ g) x = parallelCost f (g x) + parallelCost g x
parallelCost exl x = 0
parallelCost exr x = 0
parallelCost (f ▵ g) x = parallelCost f x ⊔ parallelCost g x

And we also have the laws that


id ∘ f = f
f ∘ id = f
exl ▵ exr = id

Now if we define reduce as

reduce : (n : ℕ) → (tree n A ⇨ A)
reduce zero = id
reduce (suc n) = add ∘ twice (reduce n)

We can calculate the cost of any call, for example

parallelCost (reduce 2) n
 = parallelCost (add ∘ (add ∘ exl) ▵ (ad ∘ exr)) n
 = parallelCost add (exl n) ⊔ parallelCost add (exr n) + parallelCost add ((add ∘ exl ▵ add ∘ exr) n)

Now we may assume that the cost of the add function is constant, i.e. it takes the same amount of time to add any two numbers regardless of their size, which would add the axiom

parallelCost add x = m

This allows us to get a closed form for the runtime.


parallelCost (reduce (suc n)) x
 = parallelCost (add ∘ twice (reduce n)) x
 = parallelCost add (twice (reduce n) x) + parallelCost (reduce n) x
 = m + parallelCost (reduce n) x

Solving this recurrence relation gives us

parallelCost (reduce n) x = m * n

We can also do this same process with sequentialCost, in which case we need an alternate axiom.

sequentialCost (f ▵ g) x = sequentialCost f x  +  sequentialCost g x

This gives the equation

sequentialCost (reduce (suc n)) x
 = sequentialCost (add ∘ twice (reduce n)) x
 = sequentialCost add (twice (reduce n) x) + sequentialCost (reduce n) (exl n) + sequentialCost (reduce n) (exr n)

Solving this recursive equation gives us

sequentialCost (reduce n) x = (2 ^ n - 1) * m

Anyway, I hope these ideas are more elegant, simple, and correct in your eyes.