clash-lang / clash-compiler

Haskell to VHDL/Verilog/SystemVerilog compiler
https://clash-lang.org/
Other
1.42k stars 150 forks source link

Incorrect handling of diffferential clock pairs #2518

Closed lmbollen closed 1 year ago

lmbollen commented 1 year ago

Currently we use two separate Clock dom arguments to represent differential clock inputs (see clockWizardDifferential) However, this leads to Clash generating two create_clock constraints for this clock signal, which is incorrect.

It will at least mess up timing for Vivado.

We should either find a way to detect this situation and only generate a single constraint on the Pchannel of the pair, or make a separate datatype (maybe DiffClock dom) that represents a differential clock pair.

christiaanb commented 1 year ago

The easiest work-around/solution is to make one part of the pair Signal domIn Bit instead of Clock domIn.

leonschoorl commented 1 year ago

For reference Xilinx UG903 on page 83 says:

[...] the primary clock must only be created on the positive input of the differential buffer. Creating a primary clock on each of the positive/negative inputs of the buffer would result in unrealistic CDC paths.

So having only create_clock for the positive half of a differential clock pair is apparently what you're supposed to do.

martijnbastiaan commented 1 year ago

DiffClock dom

Perhaps to make Clash's life easier we could split it up in ClockP and ClockN. Still doesn't feel all that great though.

DigitalBrains1 commented 1 year ago

Is adding types ClockP and ClockN really simpler than having a 2-bit type DiffClock?

So I'm proposing where Clock currently does clk : in std_logic / input clk / input logic clk, DiffClock does clk : in std_logic_vector(1 downto 0) / input [1:0] clk / input logic [1:0] clk. clockWizardDifferential can pick it apart and feed it to the Xilinx primitive.

I'm glossing over our VHDL subtype.

[edit] This also maintains the property that the two signals are really tied. It makes no sense at all to have ClockP dom1 and ClockN dom2 where dom1 /= dom2. Our two clock inputs was, as I said earlier, a little white lie that has now come to bite us: it really is a single signal, carried on two wires. [/edit]

martijnbastiaan commented 1 year ago

Not to derail the discussion too much, but when I was trying to get Clash to work on Amazon EC2 F1 I noticed Vivado really didn't like it when you'd put clocks in an array/vector. It's also somewhat sensitive to delta delay problems - so I'd prefer it if we could have Clash render two wires instead. (This doesn't invalidate the rest of your comment :-))

Edit: though we've had success over at Bittide with clocks in arrays! So it's not a given.

christiaanb commented 1 year ago

Using Signal domIn Bit for the negative input still seems like the most hassle-free way forward.

DigitalBrains1 commented 1 year ago

Not to derail the discussion too much

I think that's actually very valuable input and ugh delta delays, it slipped my mind.

Using Signal domIn Bit for the negative input still seems like the most hassle-free way forward.

Hassle-free, and we could almost implement it in a delta delay's time even, but do you intend this as anything other than a temporary workaround? Because it really feels like heaping on the nastiness. First we lie a bit about having two clocks, and now we even disguise "one of them" as a signal. Even if we would view that signal as alternating 0's and 1's it would still be a factor two slow for what it really is: a clock. The only way in which it is correct is that Bit happens to eventually map to the same HDL type. HDL doesn't care about the domain so that's magicked away... meanwhile the VHDL would look like this:

entity topEntity is
  port(-- clock
       clk_p  : in Foo_topEntity_types.clk_System;
       clk_n  : in std_logic;
       -- reset
       rst    : in Foo_topEntity_types.rst_System;
       [...]

Quartus handles this issue a lot better, IMO. Too bad we also need to get Vivado to work ;-P.

[edit] It also opens up the way for the user to mishandle the clock. We specifically pull a Clock out of a tuple, and we don't offer any operations in Clash on them, but such a Signal dom Bit is free to mess with. Delta delays are suddenly vividly in my mind. [/edit]

DigitalBrains1 commented 1 year ago

Right, we already split off clocks from tuples. That gave me an idea. I wonder if it would already be enough to just provide the following user type without a user-visible constructor (and provide {tbD,d}iffClockGen or tweak seClockToDiffClock obviously). Might be nice to tweak the name generation, though.

DiffClock.hs ```haskell {-# LANGUAGE BangPatterns #-} {-# OPTIONS_GHC -fno-do-lambda-eta-expansion #-} module DiffClock where import Clash.Explicit.Prelude data DiffClock dom = DiffClock (Clock dom) (Clock dom) topEntity :: DiffClock System -> Reset System -> Signal System (Unsigned 8) -> Signal System (Unsigned 8) topEntity clk rst i = f clk rst i f :: DiffClock System -> Reset System -> Signal System (Unsigned 8) -> Signal System (Unsigned 8) f !_clk !_rst = id {-# NOINLINE f #-} ```
VHDL `topEntity.vhdl` ```vhdl -- Automatically generated VHDL-93 library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.NUMERIC_STD.ALL; use IEEE.MATH_REAL.ALL; use std.textio.all; use work.all; use work.DiffClock_topEntity_types.all; entity topEntity is port(-- clock cclk : in DiffClock_topEntity_types.clk_System; -- clock cclk_0 : in DiffClock_topEntity_types.clk_System; -- reset rst : in DiffClock_topEntity_types.rst_System; i : in unsigned(7 downto 0); result : out unsigned(7 downto 0)); end; architecture structural of topEntity is begin f_result : entity f port map ( result => result , \c$_clk\ => cclk , \c$_clk_0\ => cclk_0 , \_rst\ => rst , \c$arg\ => i ); end; ``` `f.vhdl` ```vhdl -- Automatically generated VHDL-93 library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.NUMERIC_STD.ALL; use IEEE.MATH_REAL.ALL; use std.textio.all; use work.all; use work.DiffClock_topEntity_types.all; entity f is port(-- clock \c$_clk\ : in DiffClock_topEntity_types.clk_System; -- clock \c$_clk_0\ : in DiffClock_topEntity_types.clk_System; -- reset \_rst\ : in DiffClock_topEntity_types.rst_System; \c$arg\ : in unsigned(7 downto 0); result : out unsigned(7 downto 0)); end; architecture structural of f is begin result <= \c$arg\; end; ```
Verilog `topEntity.v` ```verilog /* AUTOMATICALLY GENERATED VERILOG-2001 SOURCE CODE. ** GENERATED BY CLASH 1.6.3. DO NOT MODIFY. */ `timescale 100fs/100fs module topEntity ( // Inputs input c$clk // clock , input c$clk_0 // clock , input rst // reset , input [7:0] i // Outputs , output wire [7:0] result ); f f_result ( .result (result) , .c$_clk (c$clk) , .c$_clk_0 (c$clk_0) , ._rst (rst) , .c$arg (i) ); endmodule ``` `f.v` ```verilog /* AUTOMATICALLY GENERATED VERILOG-2001 SOURCE CODE. ** GENERATED BY CLASH 1.6.3. DO NOT MODIFY. */ `timescale 100fs/100fs module f ( // Inputs input c$_clk // clock , input c$_clk_0 // clock , input _rst // reset , input [7:0] c$arg // Outputs , output wire [7:0] result ); assign result = c$arg; endmodule ```
SystemVerilog `topEntity.sv` ```systemverilog /* AUTOMATICALLY GENERATED SYSTEMVERILOG-2005 SOURCE CODE. ** GENERATED BY CLASH 1.6.3. DO NOT MODIFY. */ `timescale 100fs/100fs module topEntity ( // Inputs input logic c$clk // clock , input logic c$clk_0 // clock , input logic rst // reset , input logic [7:0] i // Outputs , output logic [7:0] result ); f f_result ( .result (result) , .c$_clk (c$clk) , .c$_clk_0 (c$clk_0) , ._rst (rst) , .c$arg (i) ); endmodule ``` `f.sv` ```systemverilog /* AUTOMATICALLY GENERATED SYSTEMVERILOG-2005 SOURCE CODE. ** GENERATED BY CLASH 1.6.3. DO NOT MODIFY. */ `timescale 100fs/100fs module f ( // Inputs input logic c$_clk // clock , input logic c$_clk_0 // clock , input logic _rst // reset , input logic [7:0] c$arg // Outputs , output logic [7:0] result ); assign result = c$arg; endmodule ```

PS: Huh? I'm using Clash 1.6.3? That's funny! I should update that.

martijnbastiaan commented 1 year ago

That looks very nice @DigitalBrains1 !

christiaanb commented 1 year ago

@DigitalBrains1 suggestions plus the following patch would be sufficient then:

diff --git a/clash-lib/src/Clash/Driver.hs b/clash-lib/src/Clash/Driver.hs
index c26de37a6..10e3f9fd5 100644
--- a/clash-lib/src/Clash/Driver.hs
+++ b/clash-lib/src/Clash/Driver.hs
@@ -42,6 +42,7 @@ import qualified Data.ByteString.Lazy             as ByteStringLazy
 import qualified Data.ByteString.Lazy.Char8       as ByteStringLazyChar8
 import           Data.Char                        (isAscii, isAlphaNum)
 import           Data.Default
+import           Data.Function                    (on)
 import           Data.Hashable                    (hash)
 import           Data.HashMap.Strict              (HashMap)
 import qualified Data.HashMap.Strict              as HashMap
@@ -787,7 +788,7 @@ createHDL backend opts modName seen components domainConfs top topName = flip ev
     hdlNmDocs1 = map (first (<.> Clash.Backend.extension backend)) hdlNmDocs0
     topFiles = concat incs ++ typesPkg1 ++ hdlNmDocs1

-    topClks = findClocks top
+    topClks = List.nubBy ((==) `on` snd) (findClocks top)
     sdcInfo = fmap findDomainConfig <$> topClks
     sdcFile = Data.Text.unpack topName <.> "sdc"
     sdcDoc  = (sdcFile, pprSDC (SdcInfo sdcInfo))

the generated topEntity.sdc file looks like this then:

create_clock -name {cclk} -period 10.000 -waveform {0.000 5.000} [get_ports {cclk}]
DigitalBrains1 commented 1 year ago

@DigitalBrains1 suggestions plus the following patch would be sufficient then:

Thanks! But I think that is too aggressive. For instance, our own test for the two Xilinx clockWizards is defined as:

topEntity ::
  Clock DomIn ->
  DiffClock DomIn ->
  Reset DomIn ->
  Signal DomOut (Index 10, Index 10)
topEntity clkInSE clkInDiff rstIn =

(well, clearly I edited the test to have the type I present here)

So there are two clocks that happen to have the same domain. I think this is a valid design, even though one can wonder why one has two oscillators with the same frequency; a single oscillator usually suffices. Maybe clock A is shared with other chips on the PCB and might be turned off to save power, while clock B keeps running so the FPGA can still do its low-power-mode tasks? If we view this as a valid design, then your patch clearly weeds out too much:

create_clock -name {clkInSE} -period 41.666 -waveform {0.000 20.833} [get_ports {clkInSE}]

As it sees the second and third clock inputs as duplicate of the first.

I think the solution lies in somehow matching on Clash.Signal.Internal.DiffClock explicitly and dropping the second clock?

christiaanb commented 1 year ago

Having two external inputs with exactly matching waveforms seems highly improbably. The example that you give with two independent oscillators with the same frequency will not generate matching waveforms and you should thus model them as separate domains in order to avoid meta-stability.

christiaanb commented 1 year ago

The other example with traces coming from the same oscillator source, but one having a path with some switching element in between will not have exactly matching waveforms.

christiaanb commented 1 year ago

Sadly, we cannot generate a warning when we see a Synthesize boundary component with two clocks with the same domain, since we do not know whether it's just a sub-component or the top-level component. Because for sub-components, having multiple clocks from the same domain would be valid.

e.g. the example that you gave with one clock for safety critical, and one gated for low-power, would be something that can be checked by the static timing analysis tools.

DigitalBrains1 commented 1 year ago

I think we should document it well then. Because we are already pretty loose with our domains: we give examples where we assign some domain to an asynchronous signal, like the reset input of a top entity. In fact, I don't think we give any examples where inputs to the top entity are actually synchronous to the domain we define for that input. So as a user, I wouldn't bat an eye at defining some domain and using that twice to input to two different PLL's. Sure, the two signals are not the same clock domain, but they match on all the things we define in a Domain, what's the problem?

(Note that in my example with two domains with the same frequency, I did assume that the user was knowledgeable enough to assign different domains to the outputs of the two PLL's. So there would be no meta-stability issues. It would just be a little white lie about the inputs of the top entity. Something we do all the time.)

DigitalBrains1 commented 1 year ago

I feel uncomfortable deciding for the user that it makes no sense to have multiple clock inputs with the same domain.

Still, maybe this is such an exotic configuration people should just write their own create_clock statements. I'll go along with the List.nubBy solution, but I would like to document it. I don't really see what is the best way to do that.

DigitalBrains1 commented 1 year ago

How about adding a comment line to the SDC whenever List.nubBy did actually do something?

# NOTE: There are multiple clock inputs with the same domain. Only a single
# 'create_clock' statement is issued for each domain. Please use multiple
# domains if you need multiple 'create_clock' statements.
create_clock -name {cclk} -period 10.000 -waveform {0.000 5.000} [get_ports {cclk}]

That way the solution is in the same place as the problem; an investigating user should be able to find it.

Come to think of it, I suppose it would already be okay if we just always included the header

# NOTE: Only a single 'create_clock' statement is issued for each domain. Please
# use multiple domains if you need multiple 'create_clock' statements.
leonschoorl commented 1 year ago

Maybe we should combine the DiffClock with the ClockN approach. Add a newtype ClockN (dom :: Domain) = ClockN { clockNTag :: SSymbol dom } For which clash does all the same things it does now for Clock, like pulling it out of product type etc. Except that for ClockN it won't generate the create_clock for in the SDC file.

And also have the data DiffClock (dom :: Domain) = DiffClock (Clock dom) (ClockN dom) Because I think that is a nice interface for things like clockWizardDifferential.

Just discussed with @DigitalBrains1 and he like it.

martijnbastiaan commented 1 year ago

Nubbing the clock list smells too much like guessing in the face of ambiguity: if we're convinced it is wrong we should throw an error, if we're not we should just go ahead and do the stupid thing (write two constraints).

One vote in favor of Clock+ClockN(+DiffClock). It's unsurprising and simple to implement.