Closed pepijndevos closed 5 years ago
So with some more help from David I got the enable part working. It wasn't working because you've added dff2dffe -unmap-mince 8
which converts the DFFE back to DFF+MUX. I'm not sure what the reason for this was, as removing that line made almost every single benchmark use less cells and chips.
So if there is one thing I'd consider merging from this PR it's removing the unmap. Unless you have a good reason for it of course, which you very well might.
Meanwhile this change actually causes the shift register to be used in VexRiscv, but makes chip count slightly worse for some reason. Looking into this I noticed there is a lowercase $dff
in the stat output, so something is borked somewhere.
On Wed, 10 Jul 2019, 20:38 Pepijn de Vos, notifications@github.com wrote:
So with some more help from David I got the enable part working. It wasn't working because you've added dff2dffe -unmap-mince 8 which converts the DFFE back to DFF+MUX.
A '377 has a single clock enable for 8 DFFEs, so dff2dffe -unmap-mince 8
unmaps DFFEs that don't use a full chip. You could possibly lower the
utilisation threshold to around 6, but it would result in increasing
inefficiency.
I'm not sure what the reason for this was, as removing that line made almost every single benchmark use less cells and chips.
The statistics lie to you here: in the name of simplicity the chip counter doesn't understand the netlist and is incorrectly merging DFFEs with different clock enable inputs which would actually require separate chips. The cell count would be lower with your approach, but the chip count would likely be significantly higher because the DFFEs are being underutilised (compare one 16-bit DFF and (up to) 16 muxes to 16 DFFEs, and note that the DFFEs are barely a win even in the worst-case scenario).
So if there is one thing I'd consider merging from this PR it's removing the unmap. Unless you have a good reason for it of course, which you very well might.
Meanwhile this change actually causes the shift register to be used in VexRiscv, but makes chip count slightly worse for some reason. Looking into this I noticed there is a lowercase $dff in the stat output, so something is borked somewhere.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ZirconiumX/74xx-liberty/pull/12?email_source=notifications&email_token=AALPDW7A6QMWUGKJM7QH2MLP6Y3E7A5CNFSM4H7RBNHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZUQMTA#issuecomment-510199372, or mute the thread https://github.com/notifications/unsubscribe-auth/AALPDWZ6YHBHXCBXS6OZZRLP6Y3E7ANCNFSM4H7RBNHA .
Fair enough. The KiCad code does split them up, so eventually that would give better stats. That would be a good moment to see if e.g. mince=6 would give better results.
Muxes are only 4 per chip though, so for 16 DFFE it'd be two chips, or one DFF chip and 4 mux chips. But I agree with the general point.
I don't think this will go anywhere, so closing.
I was thinking that it is very common in HDL (especially in bit-serial design ;-) to use shift registers. So it seemed like a good idea to add them so I did.
The problem is that 99% of the use-cases I can think of for shift registers require parallel reads and/or writes and most likely init/reset. None of that is supported now. David said if we patch Yosys with a new
-tech
option it might be possible to add some things like that.So the chip supports parallel output, global clear, and enable. The techpass could in theory support enable with
-enpol
, but I could not make it work.It is quite telling that in all of the benchmarks, this pass does not get used even once. And even in the synthetic example below, it only transforms 8 flip-flops to one shift register, which is does not help much anyway.
Conclusion: Pretty useless, probably not worth it without more work. I'll leave it here, but feel free to close the PR.