clash-lang / clash-compiler

Haskell to VHDL/Verilog/SystemVerilog compiler
https://clash-lang.org/
Other
1.42k stars 150 forks source link

`asyncRom` output incompatible with Verilator #1572

Closed gergoerdi closed 3 years ago

gergoerdi commented 3 years ago

With the video branch of this repo, doing a stack build invokes Clashilator to pass the generated Clash code to Verilator. In my Clash code, I am using an asyncRom to store the Intel 8080 microcode. Clash seems to generate Verilog from that asyncRom that trips up Verilator:

 .stack-work/dist/x86_64-linux/Cabal-3.0.1.0/build/tinybasic-intel8080-video/_clashilator/clash-syn/verilog/Intel8080/topEntity/topEntity.v:8044:30391:
 Too many preprocessor tokens on a line (>20000); perhaps recursive `define
 %Error:
 .stack-work/dist/x86_64-linux/Cabal-3.0.1.0/build/tinybasic-intel8080-video/_clashilator/clash-syn/verilog/Intel8080/topEntity/topEntity.v:8044:30390:
 syntax error, unexpected end of file, expecting ',' or '}'
 %Error: Cannot continue

The line in question is 150,243 characters long, and starts like this:

  // asyncRom begin
  wire [206:0] ROM [0:256-1];

  wire [52991:0] romflat;
  assign romflat = {{{1'b0,2'bxx},{{{5'b01000,{1'b0,4'bxxxx},3'bxxx},{2'b00,2'bxx}},{{5'b01000,{1'b0,4'bxxxx},3'bxxx},{2'b00,2'bxx}}...};
  genvar i_17;
  generate
  for (i_17=0; i_17 < 256; i_17=i_17+1) begin : mk_array_3
    assign ROM[(256-1)-i_17] = romflat[i_17*207+:207];
  end
  endgenerate

The original Clash code in clash-intel8080/src/Hardware/Intel8080/CPU.hs:

microcodeFor :: Value -> Microcode
microcodeFor = asyncRom $(TH.lift $ map (microcode . decodeInstr . bitCoerce) $ indicesI @256)
christiaanb commented 3 years ago

So the first error seems to indicate that the Verilog that we generate exceeds Verilator's line-buffer... and perhaps the second error is a follow on. How many elements is the vector? do you still get the error when you reduce it to say 10 elements?

gergoerdi commented 3 years ago

How many elements is the vector?

The vector is quite big: it is a 256-length vector of 8-length vectors.

do you still get the error when you reduce it to say 10 elements?

I can't quickly try that out; however, what I could try quickly was postprocessing the resulting Verilog file so that the lines are not too long; in my case, I just split lines on },{ but of course inside Clash we could do much better. Wit this change, Verilator now succeeds, albeit with other (probably unrelated) warnings that I will try to report separately after separating them a bit:

%Warning-WIDTH: topEntity.v:4716:32: Bit extraction of var[7:0] requires 3 bit index, not 64 bits.
                                                                                                                                                                  : ... In instance topEntity
 4716 |     c$app_arg_app_arg_app_arg_0[(64'sd2)] = c$din;
      |                                ^
                ... Use "/* verilator lint_off WIDTH */" and lint_on around source to disable this message.
%Warning-WIDTH: topEntity.v:18583:31: Operator ASSIGNW expects 32 bits on the Assign RHS, but Assign RHS's UNSIGNED generates 64 bits.
                                                                                                                                                                   : ... In instance topEntity
18583 |   assign c$ds_app_arg_app_arg = $unsigned(c$ds_app_arg_app_arg_0);
      |                               ^
%Warning-UNOPTFLAT: topEntity.v:2295:15: Signal unoptimizable: Feedback to clock or circular logic: 'topEntity.result_156'
 2295 |   wire [17:0] result_156;
      |               ^~~~~~~~~~
                    topEntity.v:2295:15:      Example path: topEntity.result_156
                    topEntity.v:18981:21:      Example path: ASSIGNW
                    topEntity.v:2369:15:      Example path: topEntity.result_173
                    topEntity.v:18423:16:      Example path: ASSIGNW
                    topEntity.v:2174:15:      Example path: topEntity.ds1_9
                    topEntity.v:18511:28:      Example path: ASSIGNW
                    topEntity.v:2203:15:      Example path: topEntity.c$ds1_case_alt_23
                    topEntity.v:18472:21:      Example path: ASSIGNW
                    topEntity.v:2192:15:      Example path: topEntity.result_143
                    topEntity.v:18769:21:      Example path: ASSIGNW
                    topEntity.v:2291:14:      Example path: topEntity.result_153
                    topEntity.v:18777:21:      Example path: ASSIGNW
                    topEntity.v:2295:15:      Example path: topEntity.result_156
christiaanb commented 3 years ago

How many elements is the vector?

The vector is quite big: it is a 256-length vector of 8-length vectors.

do you still get the error when you reduce it to say 10 elements?

I can't quickly try that out; however, what I could try quickly was postprocessing the resulting Verilog file so that the lines are not too long; in my case, I just split lines on },{ but of course inside Clash we could do much better. Wit this change, Verilator now succeeds, albeit with other (probably unrelated) warnings that I will try to report separately after separating them a bit:

Thanks for confirming my suspicions, now we know that we just need to pretty-print vector constants differently.

%Warning-WIDTH: topEntity.v:4716:32: Bit extraction of var[7:0] requires 3 bit index, not 64 bits.
                                                                                                                                                                  : ... In instance topEntity
 4716 |     c$app_arg_app_arg_app_arg_0[(64'sd2)] = c$din;
      |                                ^
                ... Use "/* verilator lint_off WIDTH */" and lint_on around source to disable this message.

Right, this happens because the primitive for vector indexing uses a Haskell Int as the index. We can probably change it so it takes a Unsigned (CLog 2 n) as the index (where n is the length of the vector), we'll need to think on how to properly report overflow behaviour though, because user-level indexing operator, !!, takes an arbitrary Enum.

%Warning-WIDTH: topEntity.v:18583:31: Operator ASSIGNW expects 32 bits on the Assign RHS, but Assign RHS's UNSIGNED generates 64 bits.
                                                                                                                                                                   : ... In instance topEntity
18583 |   assign c$ds_app_arg_app_arg = $unsigned(c$ds_app_arg_app_arg_0);
      |                               ^

Interesting... I thought we eliminated implicit resizing from our generated code.

%Warning-UNOPTFLAT: topEntity.v:2295:15: Signal unoptimizable: Feedback to clock or circular logic: 'topEntity.result_156'
 2295 |   wire [17:0] result_156;
      |               ^~~~~~~~~~
                    topEntity.v:2295:15:      Example path: topEntity.result_156
                    topEntity.v:18981:21:      Example path: ASSIGNW
                    topEntity.v:2369:15:      Example path: topEntity.result_173
                    topEntity.v:18423:16:      Example path: ASSIGNW
                    topEntity.v:2174:15:      Example path: topEntity.ds1_9
                    topEntity.v:18511:28:      Example path: ASSIGNW
                    topEntity.v:2203:15:      Example path: topEntity.c$ds1_case_alt_23
                    topEntity.v:18472:21:      Example path: ASSIGNW
                    topEntity.v:2192:15:      Example path: topEntity.result_143
                    topEntity.v:18769:21:      Example path: ASSIGNW
                    topEntity.v:2291:14:      Example path: topEntity.result_153
                    topEntity.v:18777:21:      Example path: ASSIGNW
                    topEntity.v:2295:15:      Example path: topEntity.result_156

I wonder whether this is a truly circular logic, or simply an artifact of having to flatten everything to a bitvector, including product types. If you would then look at the individual bits, you'd see there isn't an actual combinational loop. Do the FPGA synthesis tools report a combinational loop for that Verilog?

gergoerdi commented 3 years ago

FWIW I just tried with fcd3c63bff3fd475b99616b5b39cffbfd0b795fd and the ASSIGNW warning persists.

I'm happy to check the combinational loop warning in Vivado if you tell me how.

christiaanb commented 3 years ago

Just run the timing analysis, it usually warns/complains about combinational loops.

gergoerdi commented 3 years ago

OK this is driving me crazy.

I wanted to change this locally quickly so that every list item is on a new line. Should be easy by just changing clash-lib's Clash.Backend.Verilog:

listBraces :: Monad m => m [Doc] -> m Doc
listBraces xs = lbrace <+> align (vsep (punctuate (comma <+> line) xs)) <+> rbrace
  where
    punctuate = A.liftA2 intersperse

You can see several redundant ways of making each element be on its own line, in increasing order of desperation:

And yet, with all these changes (and using Debug.Trace.trace to make sure the right code path is hit), the resulting Verilog code still puts all asyncRom init elements on one line!

christiaanb commented 3 years ago

Ah, I wonder if we're using renderOneLine somewhere. Let me get back to you on that.

christiaanb commented 3 years ago

In the meantime, could you try hardline instead of line.

christiaanb commented 3 years ago

Could you change: https://github.com/clash-lang/clash-compiler/blob/5a11a2d84f314812a9d2fda57a23b638c48ab8c0/clash-lib/src/Data/Text/Prettyprint/Doc/Extra.hs#L49

to

renderOneLine = renderLazy . layoutCompact

and check whether that "solves" it.

gergoerdi commented 3 years ago
renderOneLine = renderLazy . layoutCompact

and check whether that "solves" it.

Yep, that seems to do the trick. Thanks!

It's still not perfect, though: ideally, we'd set the max (not just the desired) line length, and then the list items would be broken as needed. But that's just to make it "nice"; I think the above change will be enough to get my stuff through Verilator.

gergoerdi commented 3 years ago

It's still not perfect, though: ideally, we'd set the max (not just the desired) line length, and then the list items would be broken as needed. But that's just to make it "nice"; I think the above change will be enough to get my stuff through Verilator.

Shouldn't

renderOneLine = renderLazy . layoutPretty defaultLayoutOptions

give us exactly that? I will look into this because it seems layoutPretty still puts

every element on a new line

regardless of length, which is annoying.

gergoerdi commented 3 years ago

Shouldn't

renderOneLine = renderLazy . layoutPretty defaultLayoutOptions

give us exactly that?

Ah, that's because we're using encloseSep for these large lists, which is all-or-nothing. I will try to find some other, fillSep-based solution for listBraces and that should solve this.