clash-lang / clash-compiler

Haskell to VHDL/Verilog/SystemVerilog compiler
https://clash-lang.org/
Other
1.42k stars 150 forks source link

Clash Sampling (v1.6.3) and Synthesized Hardware disagree on redundant bit replacement #2384

Closed IamfromSpace closed 1 year ago

IamfromSpace commented 1 year ago

Hello again! Still greatly enjoying my adventures with Clash :) I've recently stumbled across an issue that may simply be user error.

I have a simple function that takes in an ASCII BitVector and returns the index of lower and uppercase letters (a = 0, b = 1, A = 26, ..., Z = 51), dropping two unneeded bits.

When defined like this, it works just fine in the simulator, but in actual hardware (Alchitry Cu with ice40 HX8K via the IceStorm toolchain) only 0 has defined behavior.

parse :: BitVector 8 -> Unsigned 6
parse x = fromIntegral (if x >= 97 then x - 97 else x - 65 + 26)

If I instead define it like this, it works correctly in both contexts:

parse :: BitVector 8 -> Unsigned 6
  let y = fromIntegral x :: Unsigned 8
  in truncateB (if y >= 97 then y - 97 else y - 65 + 26)

I'm not totally sure what the expected result should be. I do see that this note on the BitVector module should have caught my attention:

as it is not safe to coerce between different size BitVector. To change the size, use the functions in the Resize class.

Should fromIntegral be a safe way to resize? If not, should (/could) the simulator also return undefined?

rowanG077 commented 1 year ago

I assume you mean Clash simulation and not (System)Verilog/VHDL simulation.

Could you post your generated HDL here with a minimal test case? If I synthesize your non-working parse with clash 1.6.3 using this topEntity:

topEntity :: Signal System (BitVector 8) -> Signal System (Unsigned 6)
topEntity a = fmap parse a

I get this verilog:

/* AUTOMATICALLY GENERATED VERILOG-2001 SOURCE CODE.
** GENERATED BY CLASH 1.6.3. DO NOT MODIFY.
*/
`timescale 100fs/100fs
module topEntity
    ( // Inputs
      input [7:0] a

      // Outputs
    , output wire [5:0] result
    );
  wire signed [63:0] c$app_arg;
  wire signed [63:0] c$case_alt;
  wire signed [63:0] c$case_alt_0;
  wire [7:0] c$bv;
  wire [7:0] c$bv_0;

  assign c$app_arg = (a >= 8'b01100001) ? c$case_alt : c$case_alt_0;

  assign c$bv = (a - 8'b01100001);

  assign c$case_alt = $unsigned({{(64-8) {1'b0}},c$bv});

  assign c$bv_0 = ((a - 8'b01000001) + 8'b00011010);

  assign c$case_alt_0 = $unsigned({{(64-8) {1'b0}},c$bv_0});

  assign result = $unsigned(c$app_arg[0+:6]);

endmodule

Which looks fine to me. I tried simulation just to be sure using this online tool and it shows the expected output. Did you get any warnings from your toolchain?

DigitalBrains1 commented 1 year ago

In Clash, it's better to avoid fromIntegral completely, and use bitCoerce. So it could look like the following:

parse :: BitVector 8 -> Unsigned 6
parse x =
  let y = bitCoerce x :: Unsigned 8
  in truncateB (if y >= 97 then y - 97 else y - 65 + 26)

Should fromIntegral be a safe way to resize? If not, should (/could) the simulator also return undefined?

You shouldn't use fromIntegral in a Clash design. However, sometimes you're synthesising library code that contains it and you cannot change it. This is generally a real problem when your word size is larger than the system word size (64 bits for most systems), but for smaller word sizes it has fewer issues. Furthermore, Clash simulation also runs designs that aren't translatable to HDL. You might still be in the process of converting your model to synthesisable code, or you might be including in the simulation a model of the environment, or even some GUI.

Note in the generated Verilog above that it is doing a few steps as 64 bit signed numbers. That's caused by the fromIntegral; the corresponding Haskell types were probably Integer, but we can't represent that in HDL, they are of unbounded length. Clash could however determine that the computation should have been safe. Try changing BitVector 8 to BitVector 80 and it will complain loudly:

Warnings ``` T2384.hs:8:1: warning: BitVector.toInteger: Integer width, 64, is smaller than BitVector width, 80. Dropping MSBs. Are you using 'fromIntegral' to convert between types? Use 'bitCoerce' instead. NB: The source location of the error is not exact, only indicative, as it is acquired after optimisations. The actual location of the error can be in a function that is inlined. To prevent inlining of those functions, annotate them with a NOINLINE pragma. T2384.hs:8:1: warning: BitVector.toInteger: Integer width, 64, is smaller than BitVector width, 80. Dropping MSBs. Are you using 'fromIntegral' to convert between types? Use 'bitCoerce' instead. NB: The source location of the error is not exact, only indicative, as it is acquired after optimisations. The actual location of the error can be in a function that is inlined. To prevent inlining of those functions, annotate them with a NOINLINE pragma. T2384.hs:8:1: warning: BitVector.toInteger: Integer width, 64, is smaller than BitVector width, 80. Dropping MSBs. Are you using 'fromIntegral' to convert between types? Use 'bitCoerce' instead. NB: The source location of the error is not exact, only indicative, as it is acquired after optimisations. The actual location of the error can be in a function that is inlined. To prevent inlining of those functions, annotate them with a NOINLINE pragma. T2384.hs:8:1: warning: BitVector.toInteger: Integer width, 64, is smaller than BitVector width, 80. Dropping MSBs. Are you using 'fromIntegral' to convert between types? Use 'bitCoerce' instead. NB: The source location of the error is not exact, only indicative, as it is acquired after optimisations. The actual location of the error can be in a function that is inlined. To prevent inlining of those functions, annotate them with a NOINLINE pragma. ```

Finally, a useful hint for bug hunting: if your whole design is combinatorial, you don't need Signal. So to synthesise parse, you could do either topEntity = parse or {-# ANN parse (defSyn "parse") #-}, there's no need for fmap here.

It would be interesting to determine why the IceStorm toolchain has issues with this HDL. We probably need to deal with that in Clash, since we decided to quell the fromIntegral warnings here, meaning something that looks translatable and issue free still causes issues. We don't want that.

IamfromSpace commented 1 year ago

I assume you mean Clash simulation and not (System)Verilog/VHDL simulation.

Correct. I was using clash-wavedrom and sampleN against a test bench. I can give it a go with Icarus Verilog as well.

Here's the meaningful diff (slightly fudged) from my whole impl here non-working to working, looks like what you got @rowanG077 :


-  assign c$app_arg_6 = (a1_1 >= 8'b01100001) ? c$case_alt_4 : c$case_alt_5;
-
-  assign c$bv_3 = (a1_1 - 8'b01100001);
+  assign y = $unsigned(c$y_app_arg[0+:8]);

-  assign c$case_alt_4 = $unsigned({{(64-8) {1'b0}},c$bv_3});
+  assign result_15 = (y >= 8'd97) ? (y - 8'd97) : ((y - 8'd65) + 8'd26);

-  assign c$bv_4 = ((a1_1 - 8'b01000001) + 8'b00011010);
-
-  assign c$case_alt_5 = $unsigned({{(64-8) {1'b0}},c$bv_4});
+  assign c$y_app_arg = $unsigned({{(64-8) {1'b0}},a1_1});

   always @(*) begin
-    default : result_16 = $unsigned(c$app_arg_6[0+:6]);
+    default : result_16 = result_15[0+:6];
   end

Did you get any warnings from your toolchain?

Just ran it again to double check, but didn't see anything notable.

it's better to avoid fromIntegral completely, and use bitCoerce. ...You shouldn't use fromIntegral in a Clash design. However, sometimes you're synthesising library code that contains it and you cannot change it.

Ah, this makes a ton of sense. So fromIntegral is a necessary evil to support, since so many useful libraries use it, but it's actually quite dangerous, since you might actually have an unbounded value behind it.

I might take a stab at a docs PR to call this out in sized types that support Integral, unless there's some reason not to!

a useful hint for bug hunting: if your whole design is combinatorial, you don't need Signal.

That's pretty handy!

It would be interesting to determine why the IceStorm toolchain has issues with this HDL.

It may possibly be a bug in Yosys? What finally got me to notice this code (it was very low on the suspect list!) was that I hard coded its output down the chain and my max clock speed of the design got substantially slower. With the original implementation it seemed to have optimized away a lot of logic, believing the value was undefined for any non-0 value anyway.

rowanG077 commented 1 year ago

@IamfromSpace

From your example in the first post I ran pre- and post-synthesis simulation and they match using yosys. The generated netlist/resource usage for both implementations is different. After running nextpnr the device utilization is the same and max frequency is the same.

I unfortunately don't have time right now to try and minimize the full design you posted to try and get to the bottom of this.

IamfromSpace commented 1 year ago

Hey all, sorry for a bit of a goose chase on this one. I went back and really dug through this and it had my head spinning a bit. Luckily, I kept my commits pretty orderly and I realized I misdiagnosed the issue. The problematic code that caused this is even more innocuous:

      bvOut = replaceBit charEntry 1 bv
      bvStored = if thisHalfDone then 0 else replaceBit charEntry 1 bvOut

vs (this one works fine):

      bvOut = replaceBit charEntry 1 bv
      bvStored = if thisHalfDone then 0 else bvOut

Since we're just replacing the same bit with the same value twice, this just should have been redundant and had no effect--a statement both Clash and Icarus Verilog simulation agree with!

Here's some highlights from the Verilog diff:

--- a/good.v
+++ b/bad.v
@@ -148,19 +148,21 @@ module topEntity
   wire [7:0] \rlp' ;
   wire [7:0] \rcp' ;
   wire [51:0] c$app_arg_22;
+  reg [51:0] c$case_alt_11;
   reg [51:0] bvOut;
   wire  c$app_arg_23;
-  wire  c$case_alt_11;
+  wire  c$case_alt_12;
   wire [5:0] c$app_arg_24;
   wire [7:0] \c$rlp'_app_arg ;
   wire [7:0] \c$rlp'_case_alt ;
   wire  thisHalfDone;
   wire [5:0] halfLineLen;
   wire [5:0] c$thisHalfDone_app_arg;
+  wire signed [63:0] c$bvOut_app_arg;
   wire [5:0] charEntry;
-  wire [60:0] c$case_alt_12;
+  wire [60:0] c$case_alt_13;
   wire [7:0] ipv;
-  reg [60:0] c$case_alt_13;
+  reg [60:0] c$case_alt_14;
   wire [5:0] c;
   wire [7:0] \slp' ;
   reg  c$app_arg_25;
@@ -1003,20 +1006,29 @@ module topEntity

   assign \rcp'  = c$ds4_app_arg[66:59] + 8'd1;

-  assign c$app_arg_22 = thisHalfDone ? 52'b0000000000000000000000000000000000000000000000000000 : bvOut;
+  assign c$app_arg_22 = thisHalfDone ? 52'b0000000000000000000000000000000000000000000000000000 : c$case_alt_11;

   assign c$din = 1'b1;

+  // replaceBit start
+  always @(*) begin
+    c$case_alt_11 = (bvOut);
+    c$case_alt_11[c$bvOut_app_arg] = c$din;
+  end
+  // replaceBit end
+
+  assign c$din_0 = 1'b1;
+
   // replaceBit start
   always @(*) begin
     bvOut = (c$ds4_app_arg[51:0]);
-    bvOut[($unsigned({{(64-6) {1'b0}},charEntry}))] = c$din;
+    bvOut[c$bvOut_app_arg] = c$din_0;
   end
   // replaceBit end

-  assign c$app_arg_23 = thisHalfDone ? c$case_alt_11 : c$ds4_app_arg[52:52];
+  assign c$app_arg_23 = thisHalfDone ? c$case_alt_12 : c$ds4_app_arg[52:52];

-  assign c$case_alt_11 = c$ds4_app_arg[52:52] ? 1'b0 : 1'b1;
+  assign c$case_alt_12 = c$ds4_app_arg[52:52] ? 1'b0 : 1'b1;

   assign c$app_arg_24 = thisHalfDone ? 6'd0 : c$thisHalfDone_app_arg;

@@ -1030,9 +1042,11 @@ module topEntity

   assign c$thisHalfDone_app_arg = c$ds4_app_arg[58:53] + 6'd1;

+  assign c$bvOut_app_arg = $unsigned({{(64-6) {1'b0}},charEntry});
+
   assign charEntry = result_17[5:0];

There does seem to be some sort of 64-bit Integer artifact from my use of fromIntegral, but it seems harmless here.

I think at any rate, this just doesn't look like a Clash issue. Both Icarus Verilog and Clash agree on the simulated outcome. As soon as the redundancy is cleared up, then synthesis is correct as well.

I unfortunately don't have time right now to try and minimize the full design you posted to try and get to the bottom of this.

No expectation of any such thing! I appreciate all the efforts and insight on this already, and I apologize that my initial detective work wasn't better.

I plan to see if I can get a minimal design that captures the issue. If we want to track this through to a root cause here or close this because it's not a Clash issue either makes sense to me!

IamfromSpace commented 1 year ago

Minimizing seems to definitely zero in on the double replaceBit.

Here is the smallest example I've been able to demonstrate on hardware. It still uses UART so I can actually send and receive data. Hardware only works if I remove the double replaceBit, Clash and Icarus Verilog get the right answer regardless.

And here is the absolute minimum example, but I haven't actually been able to validate it on hardware (but other examples make it quite unlikely that there is interplay with the other code). The Yosys output is definitely quite different when you remove the double replaceBit, but I'm not able to decipher it myself.

rowanG077 commented 1 year ago

Yeah looking at the JSON netlist output is hard to do. But yosys supports writing back a synthesized design to verilog. This will show exactly to what cells it has mapped your circuit.

For example in your design I wrote this to generate both a json netlist and a verilog netlist: yosys good.v -p "synth_ice40; write_verilog good_syn.v; write_json good.json". You can use netlistsvg to view a schematic of the generated netlist. And you can use a verilog simulator to simulate the post-synthesis design if you include the cell library which is at path yosys-config --datdir/ice40/cells_sim.v. I needed to modify the cell library slightly to make iverilog accept it but I hope that shouldn't be necessary for everyone.

I think your best bet is to post on the yosys issuetracker or IRC.

IamfromSpace commented 1 year ago

@rowanG077 Thanks for all the tips!

The schematic view of the netlist is a clear smoking gun. The double replaceBit design shows that the top bits of the output are always set to 0, no matter the input.

I'll close this issue here and open one with Yosys, since this squarely identifies the source of the issue.

Thank you everyone for all the help and insights here!

IamfromSpace commented 1 year ago

I know this issue is closed :laughing: but just to completely close the loop on it, this issue is resolved in current versions of yosys. The version bundled as part of apio is quite old (v0.9). This was fixed in v0.15, as part of this commit 15eb66b99dc40901895d4e4a512d3c49dde3a9b4. I may submit a test case for this example to prevent regression, but ultimately this is resolved as can be :heavy_check_mark:

rowanG077 commented 1 year ago

@IamfromSpace Great! Thanks for the feedback. I'm always paranoid about the potential of such bugs existing. So it's good to know it really wasn't clash at fault here :).