clash-lang / clash-compiler

Haskell to VHDL/Verilog/SystemVerilog compiler
https://clash-lang.org/
Other
1.4k stars 148 forks source link

Generated VHDL/Verilog crashes Xilinx ISE 14.7 #356

Open gergoerdi opened 5 years ago

gergoerdi commented 5 years ago

I am getting the following error from the Xilinx ISE consuming CLaSH-generated VHDL:

Line 6134: Signal inst_chip8/clash_explicit_mealy_mealy_cpuout/_ is defined more than once.

I don't have any idea on how to even start cutting this down to a minimal repro, but my full repro is at https://github.com/gergoerdi/chip8-clash/tree/6702fa2c6979647e295712691cf45d0e686b2209.

gergoerdi commented 5 years ago

I've put all the VHDL (hand-written and CLaSH-generated) to https://github.com/gergoerdi/clash-issue-356/tree/d3ea7a5ef9d96ab0314d942603c6d1bae47de567

Funnily, none of the VHDL files even has 6134 lines..

gergoerdi commented 5 years ago

I don't have a way of minimizing the code yet, but I did notice that commenting out some opcode implementations in CHIP8/CPU.hs makes synthesis go through. In particular, commenting out the following opcode implementations is enough:

christiaanb commented 5 years ago

I’ll start investigating everything Monday. Sorry that Clash seems to be so buggy for you! It’s been tested on quite large code bases, but apparently not large enough.

gergoerdi commented 5 years ago

It’s been tested on quite large code bases, but apparently not large enough.

I guess the big difference is that those were large code bases written by people who knew what they were doing :)

gergoerdi commented 5 years ago

According to the people of StackOverflow, the error could be coming from around clash_explicit_mealy_mealy.vhdl:3116, simply overloading the synthesizer and causing it to run out of stack or heap.

If that's correct, then the relevant CLaSH code would be something around the handling of the CHIP-8 register file as a Vec 16 Word8 and indexing into it with an input-dependent index.

gergoerdi commented 5 years ago

A very helpful StackOverflow user has managed to simulate the CLaSH-generated VHDL, so maybe this is more a performance problem (as in, CLaSH generates unnecessarily complex and inefficient HDL) than a bug, which turns into a bug when combined with the ISE scalability problem that causes it to get confused on the over-complicated HDL.

There are already several comments gone, so I'm copying all his comments from the SO question to here:

some deleted comments from @user1155120 about how inefficient the generated VHDL is, and how all those unnecessary conversions to/from Int64 are putting extra pressure on the synthesizer

@user1155120 about MCV: well my basic problem is that I can't start to make it minimal until I know where to even start, since the error message is not pointing me to any meaningful location. The extended identifiers haven't been a problem with previous output from the same VHDL generator (fed into the same XST version), so I'll look into the block statements, thanks! – Cactus 23 hours ago

@user1155120 yeah it seems CLaSH converts everything to 64-bit Ints for some reason, but as long as everything gets converted back to its original small width at its consumption, it shouldn't be a problem should it? – Cactus 23 hours ago

@user1155120: I can't try it out right now, but will give it a shot if it can help: are you saying the unguarded block statements don't make any semantic difference and just flattening their contents into the main begin..end scope would/could help? – Cactus 20 hours ago

I analyzed, elaborated and simulated CHIP8 using a rudimentary testbench providing 25 MHz clock and reset event, static '0's or '1's for other inputs. It produced a large number of metavalue warnings from package numeric_std indicating a lack of initial values and updates (to_integer warnings per clock) for signed and/or unsigned objects. Running the simulation for 200 us shows hsync. Your model likely doesn't have a VHDL problem (and is related to a synthesis tool as well as lacking a Minimal, Complete, and Verifiable example). The error message provides no insight and may indicate running out of memory. – user1155120 3 hours ago

The simulation settled down to one metavalue report every 500 us or so after it started displaying visible pixels. The waveform dump (of every signal) is only about 150 MBytes. I'll get an execution time on my Macbook after it quits at 20 ms. Seems like it's still going through initialization. – user1155120 19 mins ago

Thank you so much for this analysis. I think the metavalue warnings are coming from not including a reset spike; there is a top-level VHDL wrapper file in the repo that adds the reset. What simulator are you using? – Cactus 14 mins ago edit

So would it be a fair summary to say (and I would definitely accept this as an answer!) that the error comes from the synergy of two problems: 1, CLaSH generates horribly inefficient VHDL, and 2, the Xilinx ISE falls over on too complex VHDL? (since I don't believe it is actually running out of memory on my 24G machine) – Cactus 12 mins ago edit

Zipped through the last ms, must have gone idle. Over 2 1/2 hours real time (late 2008 2.4 GHz 8GB DDR3 Macbook doing other things). The testbench does a reset. Makefile. I used an assert to kill the clock at 20 ms. ghdl 0.36-dev v0.35-271-g5cd3de4 - the commit, 24 Aug, llvm backend) and gtkwave. Not a complex hardware design, hard to synthesize, slow to simulate, unnecessary intermediary signals, block statements. – user1155120 3 mins ago

christiaanb commented 5 years ago

I'll run it through quartus, and see if I can get a more sensible error message than complaining about a line that doesn't exist.

expipiplus1 commented 5 years ago

Gergő and I ran it through Yosys without problems yesterday, albeit with Verilog output; I think ISE failed with an error on Verilog too though.

On Wed, 19 Sep 2018, 04:45 Christiaan Baaij, notifications@github.com wrote:

I'll run it through quartus, and see if I can get a more sensible error message than complaining about a line that doesn't exist.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/clash-lang/clash-compiler/issues/356#issuecomment-422547630, or mute the thread https://github.com/notifications/unsubscribe-auth/AA0U3BDETjl5Xo3Lxmp9py12xoE0v3pNks5ucVtjgaJpZM4WqMzf .

gergoerdi commented 5 years ago

Yes, I've tried running it in ISE via Verilog instead of VHDL; that gave a similar error message ("signal is defined more than once) but with a completely different location/instantiation path.

christiaanb commented 5 years ago

Running the clash-generated files from https://github.com/gergoerdi/clash-issue-356/tree/d3ea7a5ef9d96ab0314d942603c6d1bae47de567 through both Quartus 18.0 and Vivado 2018.1 doesn't give any of the issues you described.

Might be that ISE isn't handling the Extended / Escaped identifiers produced by Clash very well; which is sad because VHDL has had extended identifiers since at least VHDL-1993.

christiaanb commented 5 years ago

Flagging it up as a request for an "enhancement" to produce code accepted by ISE.

gergoerdi commented 5 years ago

I don't think it's the extended identifiers; if it was, then nothing produced by CLaSH would have worked with the ISE for me. For example, an earlier version of my code without the full-featured CPU synthesized with ISE without issues.

gergoerdi commented 5 years ago

I've tried some things yesterday.

First off, I've hacked the emitted VHDL not to contain any blocks, by lifting all block-local signal declarations into the component declaration. This merely changed slightly the bogus location where ISE reports "duplicate signal definition".

Then I rewrote the CPU in the most direct style possible, as a CPUState -> CPUIn -> (CPUState, CPUOut) function (no RWS, no accumulation of Endomorphisms, ...), since I was worried that maybe CLaSH has a deficiency in turning these abstractions into efficient VHDLS, which would be sad and surprising, considering how good a job GHC usually does at that. This simplified version still either crashes XST (with the bogus "duplicate signal definition" error showing up at various places) or causes it to loop: I've let it run for 10 hours, it is using one core 100% and not consuming any memory (~300M) with no result. This looping also happens in the 'low-level synthesis' step, same step that would crash otherwise.

I'm running out of ideas here. One remaining thing I can think of is to investigate the component usage of the generated VHDL -- I noticed XST reports 600+ multiplexers which seems a lot for such a simple CPU.

postoroniy commented 5 years ago

I would use VHDL on ISE 14.7 since it's known to be crap with verilog or having quite limited support of it. And if you can try quartus or vivado to confirm that clash has issue. But I don't think issue in clash but ISE

bgamari commented 5 years ago

For what it's worth, I use VHDL with Vivado as well. I have encountered blatantly incorrect synthesis using Vivado given Verilog.