GlasgowEmbedded / glasgow

Scots Army Knife for electronics
BSD Zero Clause License
1.93k stars 189 forks source link

Off-by-one in `glasgow.gateware.clockgen.ClockGen` #667

Open mbikovitsky opened 3 months ago

mbikovitsky commented 3 months ago

The ClockGen module has an off-by-one error when deriving a clock using .derive or .calculate. The output clock period is one input clock cycle shorter than necessary. This happens only when the ratio between the input and output frequencies is >= 3.

Amaranth playground link.

Here, the input clock is 4Hz and the output clock is 1Hz. Since the input is evenly divisible by the output, we should get an output clock that is exactly 4 input clock cycles long. And, since the ratio is divisible by 2, we should get a 2 clock high pulse and a 2 clock low pulse on the output. Instead, we get 3 cycles - 1 high, 2 low.

The issue is that calculate computes cyc = round(input_hz // output_hz) - 1, and then the implementation uses this value for the counter if it's >= 2. If cyc == 0 or cyc == 1, then we get into one of the two special cases (same frequency or half frequency), and there everything works fine.

I think the fix should be something like:

--- 
+++ 
@@ -41,33 +41,33 @@
         self.stb_r = Signal()
         self.stb_f = Signal()

     def elaborate(self, platform):
         m = Module()

-        if self.cyc == 0:
+        if self.cyc == 1:
             # Special case: output frequency equal to input frequency.
             # Implementation: wire.
             m.d.comb += [
                 self.clk.eq(ClockSignal()),
                 self.stb_r.eq(1),
                 self.stb_f.eq(1),
             ]

-        if self.cyc == 1:
+        if self.cyc == 2:
             # Special case: output frequency half of input frequency.
             # Implementation: flip-flop.
             m.d.sync += [
                 self.clk.eq(~self.clk),
             ]
             m.d.comb += [
                 self.stb_r.eq(~self.clk),
                 self.stb_f.eq(self.clk),
             ]

-        if self.cyc >= 2:
+        if self.cyc >= 3:
             # General case.
             # Implementation: counter.
             counter = Signal(range(self.cyc))
             clk_r   = Signal()

             m.d.sync += counter.eq(counter - 1)
@@ -110,14 +110,14 @@
                              .format(output_hz / 1000, input_hz / 1000))
         if min_cyc is not None and output_hz * min_cyc > input_hz:
             raise ValueError("output frequency {:.3f} kHz requires a period smaller than {:d} "
                              "cycles at input frequency {:.3f} kHz"
                              .format(output_hz / 1000, min_cyc, input_hz / 1000))

-        cyc = round(input_hz // output_hz) - 1
-        actual_output_hz = input_hz / (cyc + 1)
+        cyc = round(input_hz // output_hz)
+        actual_output_hz = input_hz / cyc
         deviation_ppm = round(1000000 * (actual_output_hz - output_hz) // output_hz)

         if max_deviation_ppm is not None and deviation_ppm > max_deviation_ppm:
             raise ValueError("output frequency {:.3f} kHz deviates from requested frequency "
                              "{:.3f} kHz by {:d} ppm, which is higher than {:d} ppm"
                              .format(actual_output_hz / 1000, output_hz / 1000,
@@ -139,15 +139,15 @@

         if logger is not None:
             if clock_name is None:
                 clock = "clock"
             else:
                 clock = f"clock {clock_name}"
-            if cyc in (0, 1):
+            if cyc in (1, 2):
                 duty = 50
             else:
                 duty = (cyc // 2) / cyc * 100
             logger.debug("%s in=%.3f req=%.3f out=%.3f [kHz] error=%d [ppm] duty=%.1f%%",
                          clock, input_hz / 1000, output_hz / 1000, actual_output_hz / 1000,
                          deviation_ppm, duty)

         return cyc

Simulation with the fix.

On a related note, the special case where the output frequency is half the input frequency has a strobe behaviour different from the normal case. In the special case stb_r and stb_f rise one cycle before their respective clock edges. In the normal case they rise on the same cycle. Not sure which behaviour is the correct one, but I think they should at least be consistent.

In terms of actual impact the whole thing may not be a big deal. Assuming most applets derive clocks that are much slower than the system clock, the off-by-one deviation should be negligible.

whitequark commented 3 months ago

Thanks! The whole idea with ClockGen was a little misguided (I can say as the person who introduced it, four years later), so I'm not sure if it even should be used after all. The QSPI controller applet shows what I would consider the best current practice.

whitequark commented 3 months ago

I think one big concern here is that a bunch of applets use ClockGen.derive without using ClockGen itself. In that case fixing an off-by-1 here could introduce an off-by-1 there, which is why I hesitate to just apply the fix.

mbikovitsky commented 3 months ago

I could search the codebase for any uses of derive that don't pass it to ClockGen and - 1 them to keep the original behaviour :) It's not exactly pretty, and if you say ClockGen shouldn't be used at all then maybe it's better to just leave it as-is?

whitequark commented 3 months ago

It looks like these are the direct uses of ClockGen itself:

software/glasgow/applet/audio/dac/__init__.py:66:        m.submodules.clkgen = clkgen = ClockGen(self.pulse_cyc)
software/glasgow/applet/audio/yamaha_opx/__init__.py:181:        m.submodules.clkgen = clkgen = EnableInserter(self.clkgen_ce)(ClockGen(self.master_cyc))
software/glasgow/applet/interface/spi_controller/__init__.py:106:        m.submodules.clkgen = clkgen = ResetInserter(clkgen_reset)(ClockGen(self.period_cyc))
software/glasgow/applet/sensor/hx711/__init__.py:50:            m.submodules.clkgen = clkgen = ClockGen(self.osc_cyc)

and these are the indirect uses of ClockGen.derive:

software/glasgow/applet/audio/dac/__init__.py:179:            pulse_cyc = self.derive_clock(clock_name="modulation",
software/glasgow/applet/audio/dac/__init__.py:181:        sample_cyc = self.derive_clock(clock_name="sampling",
software/glasgow/applet/audio/yamaha_opx/__init__.py:1129:            master_cyc=self.derive_clock(
software/glasgow/applet/interface/spi_controller/__init__.py:345:            period_cyc=self.derive_clock(input_hz=target.sys_clk_freq,
software/glasgow/applet/interface/spi_controller/__init__.py:351:            delay_cyc=self.derive_clock(input_hz=target.sys_clk_freq,
software/glasgow/applet/interface/uart/__init__.py:175:        max_bit_cyc = self.derive_clock(
software/glasgow/applet/interface/uart/__init__.py:216:        manual_cyc = self.derive_clock(
software/glasgow/applet/program/m16c/__init__.py:335:            baud: self.derive_clock(input_hz=target.sys_clk_freq, output_hz=baud)
software/glasgow/applet/sensor/hx711/__init__.py:158:            osc_cyc = self.derive_clock(clock_name="osc",
whitequark commented 3 months ago

All in all it's just five applets total, of which only one (UART) uses ClockGen.derive directly. So I guess it's probably fine to go either way.

mbikovitsky commented 3 months ago

I'll create a PR then.