calyxir / calyx

Intermediate Language (IL) for Hardware Accelerator Generators
https://calyxir.org
MIT License
450 stars 44 forks source link

Queues: Rethinking our PIFO #1999

Open anshumanmohan opened 1 month ago

anshumanmohan commented 1 month ago

Our PIFO is implemented rather peculiarly, and here I propose we rethink it.

Present

A PIFO orchestrates two FIFOs. When an item is pushed into the PIFO, it runs a simple computation on the basis of which it decides which of its two FIFOs to push the item into. When pop is called, the PIFO runs a simple computation to decide which of its two FIFOs to pop from. By customizing these two computations, we can eke out some simple policies from our PIFO.

We could extend this model a little (see https://github.com/calyxir/calyx/issues/1810 for a taste), but we'd still be pretty limited. The PIFO was designed this way to gel nicely with the "bank of FIFOs" model of queueing that Intel Tofino provisions, but let us see what we can do with a fresh mind!

Future

A truly general PIFO would simply have two methods:

The proposal, which @rachitnigam and I sketched yesterday, is based around shift registers. For a PIFO of capacity c, maintain c registers. Use some central brain to maintain a list of these registers, ordered by rank.

In the figure below, the PIFO has capacity four, and already holds three values with some ranks.

0200A2F4-0570-475C-81B6-8E00494314D2

When a pop is requested, remove and emit the best-ranked item from the brain's ordered list, and use shift registers to move the remaining values one register to the left. This is an $O(1)$ operation.

F03D70F0-2572-440D-8349-529299142F3A

When a push is requested, walk through the the ordered list maintained in the brain to find the best-ranked value whose rank is worse than the incoming value's rank. For example, if a new value v4 with rank 5 is incoming, the best-ranked loser is v2 with rank 7. Put the incoming value in the place of the best-ranked loser, and use shift registers to move all losing values one register to the right. This is an $O(n)$ operation.

AF5A5C45-3A28-436A-AF65-98BAF8B2B60F

We could perhaps pipeline pushes for better performance. Either way, we have to deal with the case when pop is called immediately after a very favorably-ranked item is pushed. Naively popping the PIFO will not work, so we'll have to maintain a short queue of items that are still being pushed. The pop operation will now need to check the PIFO proper (at $O(1)$ cost) as well as this short queue (at $O(m)$ cost), but $m$ will be some small constant so we should be okay.

anshumanmohan commented 1 month ago

@sampsyo and I discussed an alternate strategy that the brain could employ to keep track of which item is to be popped.

The brain maintains an unordered set of (register, rank, valid) tuples. Blank registers are invalid.

Both operations are $O(log(n))$, since the reduce operations can be done tournament-style.