Open eddiehung opened 6 years ago
The first step here would not be generic clustering, but to use the LutCascade functionality of the iCE40, where the LUT output (LO, not the LUTFF output O) has a dedicated path to input 2 of the LUT above (as far as we know this doesn't work between tiles).
The main things to think about when adding this in the packer are:
There is also RAM cascade, where adjacent block RAMs can share address signals in order to reduce routing congestion around large memories.
Previous experimentation with icecube showed that removing LUT cascade would reduce the Fmax of picorv32 from 73MHz to 68MHz, ignoring any additional costs due to increased congestion.
nextpnr-ice40 operates at the logic cell (LUT+FF) level. Investigate the effect of clustering related logic together to take advantage of any local feedback paths (are there any?) and to reduce the placement search space.
Not sure about other architectures, but a general solution would be great.
Once related logic is determined, one possible way of encoding affinity could be placement constraints?
What's interesting is that there exists the flexibility to capture relationships that stretch between "put these two cells next to each other" to "put these five cells somewhere in the same tile" to "put these eight cells in exactly these eight locations in one tile".
placer1 will also need to be enhanced to cope with swapping constrained cells with other constrained cells...