There's a disproportionately low number of people who know how to write a LUT mapper to people who depend on them for a living.
So, in an effort to help with that, here's a description of what is - to my knowledge - the state of the art algorithm for it: Combinational and Sequential Mapping with Priority Cuts by Mishchenko et al.
I'm going to be writing it in Rust, using the itertools crate, because I enjoy Rust, but it should be implementable in any language.
The basics
The standard way to store logic networks is with an AND-Inverter Graph (AIG): AND gates with configurable inverted inputs.
This graph has "primary inputs" (the inputs to the logic network) and "primary outputs" (the outputs of the logic network).
use itertools::Itertools;
/// An AND-Inverter Graph node.
#[derive(Copy, Clone)]
struct AigNode {
/// The index of the left-hand-side child.
/// The least significant bit indicates if the value is inverted.
lhs: u32,
/// The index of the right-hand-side child.
/// The least significant bit indicates if the value is inverted.
rhs: u32,
}
impl AigNode {
/// Create a new `AigNode`.
pub fn new(lhs_index: u32, lhs_is_inverted: bool, rhs_index: u32, rhs_is_inverted: bool) -> Option<Self> {
// Since the indexes use one bit to store inversion (since having over 2^31 nodes is already impractically big),
// fail if an index is above that 2^31-index threshold.
if lhs_index >= 1 << 30 || rhs_index >= 1 << 30 {
return None;
}
let lhs = lhs_index << 1 | lhs_is_inverted as u32;
let rhs = rhs_index << 1 | rhs_is_inverted as u32;
Some(Self { lhs, rhs })
}
/// Return the index of the left-hand-side node.
pub fn lhs_index(self) -> u32 {
self.lhs >> 1
}
/// Return whether the left-hand-side node is inverted.
pub fn lhs_is_inverted(self) -> bool {
self.lhs & 1 == 1
}
/// Return the index of the right-hand-side node.
pub fn rhs_index(self) -> u32 {
self.rhs >> 1
}
/// Return whether the right-hand-side node is inverted.
pub fn rhs_is_inverted(self) -> bool {
self.rhs & 1 == 1
}
}
/// An AND-Inverter Graph.
/// Populating this graph is out of scope for this article, but see the AIGER format if you want more.
struct Aig {
/// The primary inputs of the graph.
inputs: Vec<u32>,
/// The primary outputs of the graph.
outputs: Vec<u32>,
/// The graph nodes.
nodes: Vec<AigNode>,
}
Cut enumeration
If you take an AIG node and recursively follow its inputs all the way down to the primary inputs of the graph, you have that node's "cone".
You can partition the nodes of this cone to produce a smaller contiguous cone called a "cut". A cut has at least one input and exactly one output.
/// A partition of the graph, rooted at a node.
#[derive(Clone)]
struct Cut {
inputs: Vec<u32>,
output: u32,
}
impl Cut {
/// Create a new cut. For efficiency, inputs must be in sorted order.
pub fn new(inputs: Vec<u32>, output: u32) -> Self {
// Vec::is_sorted is currently unstable :(
assert!((0..inputs.len() - 1).all(|i| inputs[i] <= inputs[i + 1]))
Self { inputs, output }
}
}
A cut with no more than K inputs is a "K-feasible cut". To map to K-input LUTs, all cuts must be K-feasible. Since an AIG node has two inputs, K must be at least two.
impl Cut {
/// Returns the number of inputs to this cut.
pub fn input_count(&self) -> usize {
self.inputs.len()
}
}
You can make a "trivial cut" from a node by using the node's inputs as the cut inputs and the node's output as the cut output.
A cut on a node's left-hand-side can be merged with another cut on the node's right-hand-side to create a new cut rooted at that node.
impl Cut {
/// Merge two cuts together at `root`, returning the new cut.
pub fn merge(&self, rhs: &Self, root: u32) -> Self {
let inputs = self.inputs.iter()
// Having the inputs sorted lets us merge the two cuts together in linear time...
.merge(&rhs.inputs)
// ...and deduplicate it with no additional space or allocations.
.dedup()
.copied()
.collect::<Vec<u32>>();
let output = root;
Self { inputs, output }
}
}
We can then explore all cuts in a graph by starting with the trivial cuts of the primary inputs, and then walking the graph in a topological order (a node's inputs before the node), building cuts by merging together the cuts of the node's children, adding the trivial cut of the node, and filtering out all cuts which are not K-feasible.
impl Aig {
/// Return cuts in the graph that have at most `max_inputs` inputs.
pub fn enumerate_cuts(&self, max_inputs: usize) -> Vec<Vec<Cut>> {
// Mapping to 1-input LUTs doesn't make any sense.
assert!(max_inputs >= 2);
let mut cuts = vec![vec![]; self.nodes.len()];
// Every primary input has exactly one cut: the trivial cut of that input.
for input in &self.inputs {
cuts[*input as usize] = vec![Cut::new(vec![*input], *input)];
}
// Writing a topological ordering is also out of scope, but see the Wikipedia article on it.
let topological_ordering = self.topological_ordering();
for node in topological_ordering {
// Skip primary inputs.
if self.inputs.contains(&node) {
continue;
}
// Convenience variables for the children nodes.
let lhs = self.nodes[node as usize].lhs_index();
let rhs = self.nodes[node as usize].rhs_index();
// Compute the trivial cut of this node.
// Its inputs are the (sorted) inputs of this node.
let trivial_cut_inputs = vec![lhs, rhs];
trivial_cut_inputs.sort();
// Its output is the output of this node.
let trivial_cut_output = node;
let trivial_cut = Cut::new(trivial_cut_inputs, trivial_cut_output);
// Now for some iterator spam.
let node_cuts = cuts[lhs as usize].iter()
// Combine all child cuts together.
.cartesian_product(&cuts[rhs as usize])
// Merge the child cuts, using the current node as a root.
.map(|(lhs, rhs)| lhs.merge(rhs, node))
// Include the trivial cut of this node too.
.chain(std::iter::once(trivial_cut))
// Keep all `max_inputs`-feasible cuts.
.filter(|cut| cut.input_count() <= max_inputs)
// And then collect them into a vector.
.collect::<Vec<Cut>>();
// Set these cuts as the cuts at this node.
cuts[node as usize] = node_cuts;
}
// Return the set of cuts for the graph.
cuts
}
}
This performs "exhaustive cut enumeration"; that is, it keeps all cuts for a node.
To produce a final mapping, we need to select cuts to form our final LUT network.
Cut selection
Before we worked from the primary inputs and built cuts from the bottom up. To produce the final mapping, we start at the primary outputs and work down to the inputs.
impl Aig {
/// Select the final mapping of LUTs.
pub fn select_cuts(&self, max_inputs: usize) -> Vec<Cut> {
// Calculate the cuts.
let cuts = self.enumerate_cuts(max_inputs);
// The mapping frontier is all the nodes which need a cut selected that do not presently have one.
// We start with the primary outputs of the graph and let the algorithm do its job from there.
let mut frontier = self.outputs.clone();
// The selected mappings to return.
let mut mapping = Vec::new();
// The nodes which have a selected mapping, to avoid duplicate work for a node.
let mut mapped_nodes = Vec::new();
// Pick the next node from the frontier.
while let Some(node) = frontier.pop() {
// Pick the first cut as the one to use.
let cut = cuts[node as usize][0];
// Add the cut's inputs to the mapping frontier if they are not primary inputs and not already mapped.
for input in &cut.inputs {
if !self.inputs.contains(input) && !mapped_nodes.contains(input) {
frontier.push(*input);
}
}
// Mark this node as mapped.
mapped_nodes.push(node);
// Add the cut to the mapping.
mapping.push(cut);
}
// Return the final mapping.
mapping
}
}
These cuts can then be turned into LUTs (with some difficulty).
I'm going to divide this series into two parts; this mapper produces a valid mapping, but the result is pretty terrible, and the algorithm doesn't scale well beyond 4 inputs per LUT.
We can make some reasonably small tweaks to drastically improve the quality of the mapping, and that will be the next page.
LUT Mapping with Priority Cuts
There's a disproportionately low number of people who know how to write a LUT mapper to people who depend on them for a living. So, in an effort to help with that, here's a description of what is - to my knowledge - the state of the art algorithm for it: Combinational and Sequential Mapping with Priority Cuts by Mishchenko et al. I'm going to be writing it in Rust, using the
itertools
crate, because I enjoy Rust, but it should be implementable in any language.The basics
The standard way to store logic networks is with an AND-Inverter Graph (AIG): AND gates with configurable inverted inputs. This graph has "primary inputs" (the inputs to the logic network) and "primary outputs" (the outputs of the logic network).
Cut enumeration
If you take an AIG node and recursively follow its inputs all the way down to the primary inputs of the graph, you have that node's "cone".
You can partition the nodes of this cone to produce a smaller contiguous cone called a "cut". A cut has at least one input and exactly one output.
A cut with no more than K inputs is a "K-feasible cut". To map to K-input LUTs, all cuts must be K-feasible. Since an AIG node has two inputs, K must be at least two.
You can make a "trivial cut" from a node by using the node's inputs as the cut inputs and the node's output as the cut output.
A cut on a node's left-hand-side can be merged with another cut on the node's right-hand-side to create a new cut rooted at that node.
We can then explore all cuts in a graph by starting with the trivial cuts of the primary inputs, and then walking the graph in a topological order (a node's inputs before the node), building cuts by merging together the cuts of the node's children, adding the trivial cut of the node, and filtering out all cuts which are not K-feasible.
This performs "exhaustive cut enumeration"; that is, it keeps all cuts for a node. To produce a final mapping, we need to select cuts to form our final LUT network.
Cut selection
Before we worked from the primary inputs and built cuts from the bottom up. To produce the final mapping, we start at the primary outputs and work down to the inputs.
These cuts can then be turned into LUTs (with some difficulty).
I'm going to divide this series into two parts; this mapper produces a valid mapping, but the result is pretty terrible, and the algorithm doesn't scale well beyond 4 inputs per LUT.
We can make some reasonably small tweaks to drastically improve the quality of the mapping, and that will be the next page.