Closed rowanG077 closed 2 years ago
So, with the binary counter, you have a 4-bit input and 4-bit output. Since the ECP5 is natively LUT4, this means each output bit needs exactly one LUT4 to implement it, and then the result
flops can be packed together with those LUTs, and the inp
flops can be put in the same PLB. Propagation delay is minimal since it never leaves the PLB.
Meanwhile, the one-hot representation requires at least four PLBs, because there are only 8 FFs in a PLB. This means some amount of global routing is needed, which is already difficult. But the real problem is that this isn't truly one-hot: it's a priority decoder. That means that bit 1 must check the value of bit 0, and bit 2 must check the value of bits 0 and 1, bit 3 must check the value of bits 0, 1 and 2, and so on. This goes all the way to bit 15 which needs to ensure the 15 previous bits are zero first, which is implemented through multiple layers of logic and routing, which is slow.
In other words, this result is to be expected.
I have been trying to optimize some parts of a design I have to try and reach a higher fmax. One of the things I tried was try and encode states, inputs etc as one-hot so the decoding logic would shorter. But when I do this I in fact see the opposite effect. A much lower fmax.
See this gist for a simple 16 element counter that is implemented either using a one-hot or binary encoding. The full yosys+nextpnr output is also available in the gist.
Using the binary counter I reach an fmax of ~400-500Mhz but using the onehot encoding it only reaches ~130-180Mhz. I would have expected the inverse. What is the reason for this.
I route the inputs and outputs to random pins and use these commands to synthesize:
So I guess this is just completely the opposite of what I would have expected. Is this something nextpnr doesn't support well?