Queues: Rethinking our PIFO

Our PIFO is implemented rather peculiarly, and here I propose we rethink it.

Present

A PIFO orchestrates two FIFOs. When an item is pushed into the PIFO, it runs a simple computation on the basis of which it decides which of its two FIFOs to push the item into. When pop is called, the PIFO runs a simple computation to decide which of its two FIFOs to pop from. By customizing these two computations, we can eke out some simple policies from our PIFO.

We could extend this model a little (see https://github.com/calyxir/calyx/issues/1810 for a taste), but we'd still be pretty limited. The PIFO was designed this way to gel nicely with the "bank of FIFOs" model of queueing that Intel Tofino provisions, but let us see what we can do with a fresh mind!

Future

A truly general PIFO would simply have two methods:

push(v, r): enqueue value v with rank r.
pop: dequeue and return the item with best rank. The PIFO would not have any policy; we would have some external logic (a ranking function) that would assign ranks to values in a way that ekes out a policy from the PIFO. Let us put this ranking function aside for now and just design a simple, general PIFO.

The proposal, which @rachitnigam and I sketched yesterday, is based around shift registers. For a PIFO of capacity c, maintain c registers. Use some central brain to maintain a list of these registers, ordered by rank.

In the figure below, the PIFO has capacity four, and already holds three values with some ranks.

0200A2F4-0570-475C-81B6-8E00494314D2

When a pop is requested, remove and emit the best-ranked item from the brain's ordered list, and use shift registers to move the remaining values one register to the left. This is an $O(1)$ operation.

F03D70F0-2572-440D-8349-529299142F3A

When a push is requested, walk through the the ordered list maintained in the brain to find the best-ranked value whose rank is worse than the incoming value's rank. For example, if a new value v4 with rank 5 is incoming, the best-ranked loser is v2 with rank 7. Put the incoming value in the place of the best-ranked loser, and use shift registers to move all losing values one register to the right. This is an $O(n)$ operation.

AF5A5C45-3A28-436A-AF65-98BAF8B2B60F

We could perhaps pipeline pushes for better performance. Either way, we have to deal with the case when pop is called immediately after a very favorably-ranked item is pushed. Naively popping the PIFO will not work, so we'll have to maintain a short queue of items that are still being pushed. The pop operation will now need to check the PIFO proper (at $O(1)$ cost) as well as this short queue (at $O(m)$ cost), but $m$ will be some small constant so we should be okay.

@sampsyo and I discussed an alternate strategy that the brain could employ to keep track of which item is to be popped.

The brain maintains an unordered set of (register, rank, valid) tuples. Blank registers are invalid.

At pifo.pop(), the brain runs a reduce operation over its tuples, with a reduction function like min_arg_2 with the stipulation that we'll only look at valid tuples. This gets the best-ranked populated register. We emit its value and then mark it as invalid.
At pifo.push(v, r), the brain runs a reduce operation with a reduction function like is_invalid. This gets us a vacant register. We put v into that register, clobber its rank with r, and mark it as valid.

Both operations are $O(log(n))$, since the reduce operations can be done tournament-style.

calyxir / calyx

Queues: Rethinking our PIFO #1999

Present

Future