clash-lang / clash-compiler

Haskell to VHDL/Verilog/SystemVerilog compiler
https://clash-lang.org/
Other
1.43k stars 151 forks source link

clash 1.6.4 compiled against ghc 9.0 synthesises worse than compiled against 8.8 #2323

Open pbreuer opened 2 years ago

pbreuer commented 2 years ago

Clash 1.6.4 (and 1.6.3) compiled against ghc 8.8.4 on debian unstable synthesizes verilog fine for my code, but 1.6.4 compiled against ghc 9.0.2 fails on the same synthesis. Same machine, same everything, different sandboxes.

clash  -XCPP -fconstraint-solver-iterations=0 -package silently -fclash-spec-limit=80 -fclash-inline-limit=80  \
-fclash-no-render-enums \
-fclash-clear           \
--verilog               \
Test/Trace.hs
GHC: Setting up GHC took: 8.517s

KPU/TMM.hs:125:33: error:
    * Reduction stack overflow; size = 1001
      When simplifying the following type: BitSize (f m0)
      Use -freduction-depth=0 to disable this check
      (any upper bound you could choose might fail unpredictably with
       minor updates to GHC, so disabling the check is recommended if
       you're sure that type checking should terminate)
    * In the first argument of `fmap', namely `f'
      In the first argument of `fmap', namely `(fmap f)'
      In the expression: fmap (fmap f) mr'
    |
125 |               mre' = fmap (fmap f) mr'
    |                                 ^

The clash 1.6.4 compiled against 8.8.4 instead does this:

clash  -XCPP -fconstraint-solver-iterations=0 -package silently
-fclash-spec-limit=80 -fclash-inline-limit=80  \
-fclash-no-render-enums \
-fclash-clear           \
--verilog               \
Test/Trace.hs
GHC: Setting up GHC took: 3.072s
GHC: Compiling and loading modules took: 15m27s
Clash: Parsing and compiling primitives took 1.515s
GHC+Clash: Loading modules cumulatively took 1h9m11s
Clash: Compiling Test.Trace.kpu_test32
Clash: Normalization took 53m44s
[WARNING] Dubious primitive instantiation for
Clash.Signal.Internal.clockGen: Clash.Signal.Internal.clockGen is not
synthesizable! (disable with -fclash-no-prim-warn)
[WARNING] Dubious primitive instantiation for
GHC.Integer.Type.integerToInt: GHC.Integer.Type.integerToInt: Integers
are dynamically sized in simulation, but fixed-length after synthesis.
Use carefully. (disable with -fclash-no-prim-warn)
[WARNING] Dubious primitive instantiation for
GHC.Integer.Type.eqInteger#: GHC.Integer.Type.eqInteger#: Integers are
dynamically sized in simulation, but fixed-length after synthesis. Use
carefully. (disable with -fclash-no-prim-warn)
[WARNING] Dubious primitive instantiation for
GHC.Integer.Type.integerToWord: GHC.Integer.Type.integerToWord:
Integers are dynamically sized in simulation, but fixed-length after
synthesis. Use carefully. (disable with -fclash-no-prim-warn)
[WARNING] Dubious primitive instantiation for
GHC.Integer.Type.wordToInteger: GHC.Integer.Type.wordToInteger:
Integers are dynamically sized in simulation, but fixed-length after
synthesis. Use carefully. (disable with -fclash-no-prim-warn)
[WARNING] Dubious primitive instantiation for
GHC.Integer.Type.smallInteger: GHC.Integer.Type.smallInteger: Integers
are dynamically sized in simulation, but fixed-length after synthesis.
Use carefully. (disable with -fclash-no-prim-warn)
Clash: Netlist generation took 21.829s
Clash: Compiling Test.Trace.kpu_test32 took 54m41s
Clash: Total compilation took 2h3m53s

I would say "could be anything", so I hope you have more ideas! If it were me debugging that I'd work through every release of clash compiled against 8.8.4 vs compiled against 9.0.2 to find when this arose, and have no better idea than that.

BTW, what can i do to turn verilog 2001 into something for fpga nowadays? I am yet to get iverilog to produce the same output from clash's verilog as ghdl does from clash's vhdl, have never yet got verilator to terminate at anything, while iverilog seems to have given up on synthesis around v0.8, though the manual claims it does it. yosys seems possible, but it takes verilog 2005, not 2001, and when I try it on clash output, it cries about "-=' or something similar to that. Also there are no comprehensible instructions for yosys. There are rumours that there is now an "experimental" --synth switch on ghdl for converting vhdl (to what?), but no details that I can find. My experiments with yosys produce .json or .blif files via example codes and I don't know what I should be aiming at.

Is there a documented path from clash synthesis output to something an fpga will like? A generic output format would be the thing, which could then be further transformed as need be.

Regards

PTB

rowanG077 commented 2 years ago

I would say "could be anything", so I hope you have more ideas! If it were me debugging that I'd work through every release of clash compiled against 8.8.4 vs compiled against 9.0.2 to find when this arose, and have no better idea than that.

As the error message says it's most likely GHC produces different code between different versions so it's not unexpected that code that worked with a specific setting stops working in a different version. Does it all finish if you set it to 0?

BTW, what can i do to turn verilog 2001 into something for fpga nowadays? I am yet to get iverilog to produce the same output from clash's verilog as ghdl does from clash's vhdl, have never yet got verilator to terminate at anything, while iverilog seems to have given up on synthesis around v0.8, though the manual claims it does it. yosys seems possible, but it takes verilog 2005, not 2001, and when I try it on clash output, it cries about "-=' or something similar to that. Also there are no comprehensible instructions for yosys. There are rumours that there is now an "experimental" --synth switch on ghdl for converting vhdl (to what?), but no details that I can find. My experiments with yosys produce .json or .blif files via example codes and I don't know what I should be aiming at.

Is there a documented path from clash synthesis output to something an fpga will like? A generic output format would be the thing, which could then be further transformed as need be.

Depends what FPGA you want to target. Yosys can produce a netlist in a few formats. Which can then be further processed using a variety of tools. However if you have a specific FPGA in mind I would recommend to use the vendors tooling. Quartus for Intel and Vivado for Xilinx are examples. But the Verilog produced by Clash should work with Yosys(It's what I use ATM). But essentially what you do with this HDL out of scope for Clash so you should look at the documentation of the vendor you want to target.

However here is a full end-to-end blog post which illustrates the process for Intel Quartus.

pbreuer commented 2 years ago

Thanks for the very informative reply.

Applying yosys to verilog from Clash seems to produce always something like: Lexer warning: The SystemVerilog keyword-=' (at IMM.v:681) is not recognized unless read_verilog is called with -sv! IMM.v:681: ERROR: syntax error, unexpected TOK_ID`.

Should it be applied to systemverilog output instead? (using -sv gets it further but no joy: IMM.v:4984: ERROR: syntax error, unexpected TOK_BEGIN). The yosys manual seems pretty non-confident of parsing much systemverilog: "When read_verilog is called with -sv, it accepts some language features from SystemVerilog: ..."

(The -= seems to be my fault for introducing it in a primitive, but replacing it doesn't help, the parser error moves on to complain about the #3000000 in the start-up delay of 300 cycles before the clock starts with tb_clock_gen).

(System verilog instead of verilog from Clash generates this error when yosys is applied (with -sv): IMM.sv:40: ERROR: syntax error, unexpected TOK_PACKAGESEP, expecting TOK_ID or '#'. That's a declaration of a variable using a type defined in another file: IMM_types::Tup4 result_0;).

As to the specific vendor software needed/wanted, I want to prove the code will synthesize to a finite state machine, not have it actually run as a particular one on a particular support platform. The finiteness is what counts. Somebody else can do the practical part! I have to be sure they're not going to be on a wild goose chase (i.e., inevitably unsuccessful task).

The problem with ghc 9.0.2 is failure to solve a type equation involving Nat, and if it is not stopping before 1001 rounds I'm sure it won't ever stop. ghc 8.8.4 solves it immediately. The type calculation is in the decidable portion of Peano arithmetic, so I know it should finish (and does, with ghc 8.8.4) so it's a bug that it doesn't finish. The decision procedure is going wrong there. it may help if I give it more equations as helpful extra information (e.g. Min (Div (n + 3) 4 + 2) 80 <= 80) or it may not, but it is a bug, just not yours! It must be that instead of applying a substitution that simplifies the complexity, it is choosing one that does not, and then round and round ... The message about minor changes to ghc changing how long it takes is right if it terminates. Yes, that can happen. Going from terminates to does not terminate is wrong and not covered by the message.

That blog post looks great! But there's a gui involved so not something I would use or understand :-(. Don't I want to get a 'netlist' and that's that?

leonschoorl commented 2 years ago

The clock and reset generators (clockGen, tbClockGen and resetGen) are written using verilog delays or vhdl wait statements. They're indented for HDL simulation, not synthesis. And won't synthesize in general.

pbreuer commented 2 years ago

I understand that point but the issue here is the "reduction stack overflow" on synthesis when clash is compiled against ghc 9.0.2. The informative warnings on successful synthesis with clash compiled against ghc 8.8.4 are harmless, since I only want tbgenerate whatnots for simulation in iverilog of a testrig+unit under test (= testbench).

About your point, you will perhaps tell me that I should just produce only the verilog for the unit with clash and figure out some way to generate signals for it under simulation and read the outputs that does not involve clash?

Maybe, but I don't know how. I think one can do that with iverilog. Do please tell me if it's easy!

Using clash to produce verilog for a testrig that combines with the unit under test to form a testbench (all parts NOINLINEd) was what I have been doing. It has worked till now.

I have since been struggling with a too-much inlining problem for synthesis that I think I posted about the error message being repeated infinitely for. But I've got that beaten down and will see if clash164+ghc902 does better now.

Thanks and regards

PTB

pbreuer commented 2 years ago

So I retried and I am getting further errors with clash 1.6.4 + ghc 9.0.2 that do not occur with clash 1.6.4 + ghc 8.8.4. These may be more clueful:

GHC: Setting up GHC took: 1.428s

Test/Format.hs:584:16: error:
    * Expected kind `k0 -> k1',
        but `(+) (4
                  - (Mod p 4 + (4 * Div (4 - Mod p 4) 4))) p' has kind `Nat'
    * In the type signature:
        renderDSI :: forall (p :: Nat) (q :: Nat).
                     (KnownNat p,
                      KnownNat q,
                      ((4 - (Mod p 4 + (4 * Div (4 - Mod p 4) 4))) + p) (~((Div p 4
                                                                            + Min (Mod p 4) 1)
                                                                           * 4)),
                      ((4 - (Mod q 4 + (4 * Div (4 - Mod q 4) 4))) + q) (~((Div q 4
                                                                            + Min (Mod q 4) 1)
                                                                           * 4)),
                      Min (Div (p + 3) 4 + 2) 80 <= 80,
                      Min (Div (q + 3) 4 + 2) 80 <= 80) =>
                     (LineId p, Line q) -> Str80
    |
584 |              , ((4-(Mod p 4+(4*Div(4-Mod p 4)4)))+p) ~((Div p 4+Min(Mod p 4)1)*4)
    |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Test/Format.hs:584:54: error:
    * Unexpected laziness annotation: ~((Div p 4 + Min (Mod p 4) 1)
                                        * 4)
      laziness annotation cannot appear nested inside a type
    * In the third argument of `(+)', namely
        `(~((Div p 4 + Min (Mod p 4) 1) * 4))'
      In the type signature: ...

It's something to do with type calculations on Nat. That's in the decidable portion ('Dedekind') of Peano arithmetic.

The same code compiles fine under clash 1.6.4 + ghc 8.8.4 as follows:

clash -XCPP -fconstraint-solver-iterations=0 -package silently -fclash-spec-limit=80 -fclash-inline-limit=80 -fclash-no-render-enums -fclash-old-inline-strategy --systemverilog Test/IMMU.hs 2>&1 | less
GHC: Setting up GHC took: 1.081s
GHC: Compiling and loading modules took: 4m57s
Clash: Parsing and compiling primitives took 0.553s
GHC+Clash: Loading modules cumulatively took 26m54s
Clash: Compiling Test.IMMU.imm32
Clash: Compiling Test.IMMU.immu32
Clash: Compiling Test.IMMU.immu32 took 9.434s
Clash: Compiling Test.IMMU.imm32 took 11.498s
Clash: Total compilation took 27m5s
pbreuer commented 2 years ago

PS. That particular error above might be some sort of syntactic parenthesis/precedence/grouping error via ghc 9. The constraint it is complaining about is: ((4-(Mod p 4+(4*Div(4-Mod p 4)4)))+p) ~((Div p 4+Min(Mod p 4)1)*4) What "third argument of (+)"!!! [That is informing it of an equality it ought to know if it were really perfect at Dedekind arithmetic, but that it has never known till now. The 4*Div(4-Mod p 4)4 is 4*[(4-p%4)/4] which is 0 or 4 according to whether p is not precisely a multiple of 4, or is, respectively. I would guess the whole thing is saying there are two ways of expressing the multiple of 4 just above or just below p, or something like that. Trying at most the 4 different values of p mod 4 would tell us.]

leonschoorl commented 2 years ago

Yes, starting with version 9.0, GHC is more sensitive to whitespace, or the lack thereof. That's what's causing the Unexpected laziness annotation error, a ~ not followed by whitespace is read by GHC as laziness annotation. Try putting spaces around all ~ in your types. (assuming they're meant to be type equality)

See also: https://gitlab.haskell.org/ghc/ghc/-/wikis/migration/9.0#whitespace-sensitive-and-

pbreuer commented 2 years ago

I love the idea of "whitespace sensitivity"! Thank you.

But correcting that does not make the original error message (first post above) go away :-(. I'd better look at what other ghc 9.0 changes might be the cause ... maybe

GHC now consistently does eager instantiation during type inference

.

pbreuer commented 2 years ago

To sum up: 1) yes, "ghc now is eager" on type calculations was the reason for the evaluation loop on some complicated type constraints when clash is compiled against ghc 9.0 instead of 8.8. Instantantiating the types further earlier in order to simplify those constraints fixed it. Less generic, but flies. so ghc thing, not yours. 2) Yes, Rowan is correct that yosys (v0.21 is what I am using) parses the verilog 2001 produced by clash fine, even though the yosys parser is 2005. The error i saw came from a spurious tbClockGen (which does not and should not synthesize) left in my code. The reason is that expands to an always forever and the yosys parser doesn't know the keyword forever. I'm a bit worried because the and/shift precedence swap from 2001 to 2005 illustrates that 2005 does not conserve 2001 language semantics. If forever has been lost in 2005 that would be an instance of a syntax incompatibility. Where there is one there will be more. I'm impressed that Clash is producing verilog that seems to be in a common subset syntax-wise. Is it intentional? And semantics-wise too?

(you could produce synthesisable code for tbClockGen but I suppose you don't want to.)

Regards and thanks

PTB