Support for PLL primitives

GuzTech commented 4 years ago

The goal of this issue is to document the kinds of PLL primitives of the various FPGA manufacturers, so that we can support them in nmigen.

Lattice

Lattice PLLs are different for the different FPGA families.

iCE40 (LP/LM/HX/Ultra/UltraLite/UltraPlus)

The iCE40 family has five types of PLLs. They are all capable of shifting the output clock by 0 and 90 degrees, and allow fine delay adjustments of up to 2.5 ns (typical) in 150 ps increments (typical).

SB_PLL40_CORE

Can be used if the source clock of the PLL originates on the FPGA or is driven by an input pad that is not the bottom IO bank (bank 2).

SB_PLL40_PAD

Can be used if the source clock of the PLL is driven by an input pad that is located in the bottom or top IO bank (banks 2 and 0 respectively). When using this PLL, the source clock cannot be used anymore.

SB_PLL40_2_PAD

Can be used if the source clock of the PLL is driven by an input pad that is located in the bottom or top IO bank (banks 2 and 0 respectively). This PLL outputs the requested clock as well as the the source clock.

SB_PLL40_2F_CORE

Can generate two different output frequencies. Can be used if the source clock of the PLL originates on the FPGA.

SB_PLL40_2F_PAD

Can generate two different output frequencies. Can be used if the source clock of the PLL is driven by an input pad that is located in the bottom or top IO bank (banks 2 and 0 respectively).

ECP5

The ECP5 family has bunch of clocking elements, but only one type of PLL.

EHXPLLL

It has four outputs, but if the user wants hook the PLL feedback, then one of the outputs cannot be used in the fabric. It supports dynamic clock selection and control, dynamic phase adjustment, etc.

Xilinx

Xilinx has two clock synthesis IPs as far as I know: PLL and Mixed-Mode Clock Manager (MMCM). The different FPGA families have different versions of these IPs.

Spartan-6

The Spartan-6 family have Clock Management Tiles (CMTs), where each contain one PLL and two Digital Clock Managers (DCMs). The latter can be used to implement Delay Locked Loops, digital frequency synthesizers, digital phase shifters, or a digital spread spectrum.

There are two PLL primitives, that each have six clock outputs and a dedicated feedback output.

PLL_BASE

PLL_BASE provides access to the most commonly used features, such as clock deskewing, frequency synthesis, coarse phase shifting, and duty cycle programming.

PLL_ADV

PLL_ADV provides access to all PLL_BASE features, such as dynamically reconfiguring the PLL.

7 Series

The 7 series also have CMTs, where each contains a MMCM and a PLL. The PLL contains a subset of the functions of the MMCM. The PLL has six clock outputs, whereas the MMCM has seven clock outputs. Both have a dedicated feedback output.

MMCME2_BASE and PLLE2_BASE

Both BASE IPs proved access to the most commonly used features, such as clock deskewing, frequency synthesis, coarse phase shifting, and duty cycle programming.

MMCME2_ADV and PLLE2_ADV

The MMCME2_ADV IP provides access to all BASE features, such as additional ports for clock switching, access to the Dynamic Reconfiguration Port (DRP), and dynamic fine-phase shifting. The PPLE2_ADV IP provides the same features, except for dynamic fine-phase shifting.

UltraScale / UltraScale+

The CMTs in the Ultrascale family contain an MMCM and two PLLs. Just like with the 7 series FPGAs, there are _BASE and _ADV IPs so they won't be repeated here.

MMCM

The MMCMs have seven clock outputs and a dedicate feedback output. The MMCM IPs for the UltraScale are MMCME3_BASE and MMCME3_ADV, whereas for the UltraScale+ they are MMCME4_BASE and MMCME4_ADV.

PLL

The PLLs have two clock outputs each. Similarly to the MMCMs, the PLL IPs are PLLE3_BASE and PLLE3_ADV for the UltraScale family, and PLLE4_BASE and PLLE4_ADV for the UltraScale+ family.

Intel

From what I can gather, all Intel FPGA PLLs are instantiated using the ALTPLL IP, and all PLLs have five clock outputs.

Common Remarks

PLLs multiply and divide clocks to achieve the desired output clock frequencies. But they are limited to the allowed minimum and maximum mulitplier, divider, and intermediate clock frequencies. These values can also depend on the speedgrade of the FPGA, which would mean that we should be able to supply the speedgrade of the FPGA.
The calculation of how to achieve the desired clock frequencies given the input clock frequencies is different.
Most PLLs have additional functionalities that we could initially not support, and slowly introduce one by one. Also, there are other IPs that are often used in conjunction with PLLs, so they could be added too.
For Xilinx clocking resources, we could just always use the _ADV IPs and just supply default values for addition feartures that are not used.
The information above can of course contain errors :)

Approach

Maybe it would be a good idea to just start with the iCE40 family (since its the simplest, and I use it my current design). We could implement the PLLs similarly to how Litex does it.

jeanthom commented 4 years ago

I'd rather think that the ECP5 would be a better starting point since there is only one PLL type to take care of. I'm not sure if it has completely been reverse engineered though.

GuzTech commented 4 years ago

Fair enough, I'll see if I can whip up something for the ECP5 and share it here.

GuzTech commented 4 years ago

I have an initial implementation in #426.

GuzTech commented 4 years ago

Here are some of my thoughts on how a PLL could be used:

PLL Creation

As I see it, there are three ways of creating a PLL.

Create a specific PLL (such as an SB_PLL40_2F_CORE).
Create a "family" PLL and supply the specific PLL primitive as a parameter when creating it (such as an iCE40PLL(SB_PLL40_PAD)).
Create a generic PLL that derives from the platform which types of PLLs exist, picking the "correct" one.

The first one is basically a class that instantiates the specific PLL primitive, with some logic that calculates the PLL parameters. I think this leads to repeated code, since most primitives differ slightly among themselves.

The second one somewhat abstracts away the specifics of each PLL primitive and leads to code reuse.

The third one looks like it would require quite a bit of code. Who decides what the "correct" PLL primitive is? For example, for the iCE40 family, the usage of the _CORE and _PAD primitives is limited by which bank the input clock originates from. We would have to be able to check this, and I don't think this is the responsibility of nMigen.

Creating Output Clocks

In Litex, you create a PLL object and call register_clkin on it, where you give the input clock signal and it's frequency. For each output clock, you first create a ClockDomain, and call create_clkout on the PLL object and supply the ClockDomain. Finally, you add a period constraint to the Platform.

I like how this works, but of course I'm open to suggestions. It might not make sense for the iCE40 PLL primitives with one clock output, but it would be consistent.

Also, maybe we should supply the Platform when creating the PLL object, so that when we create an output clock, it would automatically add the corresponding period constraint so that the user doesn't forget it. But maybe this is not necessary.

rroohhh commented 4 years ago

One more thing that might be interesting to think about (but not sure if this actually fits this issue) is where the PLL gets stored. Given there is usually only a small number of them, it almost feels like putting them into the platform could be a good idea. It would make it easier to generate the required clocks for different parts of a design without manually passing a PLL instance around or having to create all clocks toplevel. Of course it also has downsides, like making phase relations less obvious.

Another thing (atleast on xilinx 7series, not sure if there are similar things on other platforms) is the placement of the PLL, if it is in the same clocking region as say the pin the input clock comes into the fpga from, one only needs a regional clock buffer for the input clock, if it is not one needs a global buffer. This is probably far out (and maybe out of scope), but some support from nmigen to help with choosing the correct clock buffer / automatically finding a good combination would be pretty interesting.

whitequark commented 4 years ago

Okay, so I think we have two basic problems here that are essentially separate:

Given a platform PLL instance (following the vendor conventions), perform frequency calculations, both in forward and in reverse. Basically, the PLL should be possible to describe by a set of Diophantine equations that relate the input frequencies, the parameters, and the output frequencies together, and these equations should be possible to solve for any desirable unknowns. I'm going to call this the frequency problem.
Given a set of requirements (clock input, clock output, reset or lock), choose the right platform PLL instance and the right ports to connect. This is both harder (because rather than being a well-understood math problem, it's a messy coordination problem without sufficient information) and easier (because we can and should punt on the hardest decisions, requiring the user to specify them explicitly). It also depends on the previous problem. I'm going to call this the routing problem.

It seems to me that it's worthwhile to start with the frequency problem. Do you think you can start collecting the info about PLLs and expressing it in the form of Diophantine equations? Then we'll need to figure out some way to solve them other than brute force, I'm sure there should be libraries that help with that.

rroohhh commented 4 years ago

Also, maybe we should supply the Platform when creating the PLL object, so that when we create an output clock, it would automatically add the corresponding period constraint so that the user doesn't forget it. But maybe this is not necessary.

I think the PLL should definitely create the clock constraints and given that the platform is already passed to elaborate it should be fairly easy to do that there.

whitequark commented 4 years ago

I think the PLL should definitely create the clock constraints

This isn't usually necessary because the backend toolchain already knows the output frequency if you constrained the inputs. But in any case, the platform is the factory of PLLs, so a PLL naturally knows what the platform is.

rroohhh commented 4 years ago

Oh whoops somehow that slipped my mind.

GuzTech commented 4 years ago

It seems to me that it's worthwhile to start with the frequency problem. Do you think you can start collecting the info about PLLs and expressing it in the form of Diophantine equations?

These are the polynomial equations that describe the relationship between the input and output frequencies, and the multiplication and division (integer) values, right?

Then we'll need to figure out some way to solve them other than brute force, I'm sure there should be libraries that help with that.

Is brute forcing an actual issue? It might just be me, but it's not like Python chokes on calculating the parameters every time I build my design. Maybe I'm overlooking something?

whitequark commented 4 years ago

These are the polynomial equations that describe the relationship between the input and output frequencies, and the multiplication and division (integer) values, right?

Yes, with integer solutions, and some inequalities too, since none of the parameters have an arbitrary range.

Is brute forcing an actual issue?

We can start with brute forcing and improve it later, assuming that our data representation doesn't prevent us from doing so. (E.g. expressing the equations with arbitrary Python code obviously won't work.)

GuzTech commented 4 years ago

Ok, I'll look up the formulas from the datasheets for each PLL. The data representation should express the equations symbolically using a symbolic expression library I assume.

whitequark commented 4 years ago

The data representation should express the equations symbolically using a symbolic expression library I assume.

Something like that. It's not clear if we can use an existing library or if we should make yet another tiny sub-language for this--I'm leaning towards the latter since historically nMigen has been very conservative with the dependencies and lately it became more conservative, not less (126f0be731ab008657ca1c4fc05ce7cc6e355169).

whitequark commented 4 years ago

For now I actually think it makes sense to collect PLL equations in this issue in human-readable form while the rest of the design is being worked on; the data entry part is the most laborous one anyhow, and converting them to the format accepted by whatever CAS we end up using is easy.

GuzTech commented 4 years ago

Yes, I agree. To a user, it shouldn't matter how the parameters are being resolved, as long as the specification of input/output frequencies/phases is designed correctly.

Fatsie commented 4 years ago

Is the final API meant to be for deriving different digital clocks are does one want to be more general ? In latter case other constraints than frequency - like jitter, duty ratio - may become important. For example high-speed, high-precision ADCs will need clocks with very low jitter. Other example is I think the litex DDR interface that needs two clocks of same (high) frequency but with 90 deg phase offset (if I remember correctly).

GuzTech commented 4 years ago

I would want it to be as general as much as it makes sense. I think most of the issues you mentioned can be solved if we can specify the location of the PLLs. In the DDR example, the user can use the same PLL to generate the two clocks, and use the two different clock domains as needed. Maybe there is a way to specify the phase difference as a constraint?

whitequark commented 4 years ago

I'm not sure how specifying the location would help? Having two clocks with a phase offset however should be easily expressible via the same approach of using equations, just for phase and not frequency.

GuzTech commented 4 years ago

Yeah, you're right. I got confused :)

Output clocks are routed through dedicated clock networks, so it doesn't really matter. I would matter if they were to be routed through the fabric, but that's not recommended and you have specify that by hand for some vendors.

But I don't think it would be bad if you could specify its location.

GuzTech commented 4 years ago

iCE40 Family

Both the LP and HX parts have the same constraints when it comes to PLL parameters. The parameters are:

f_IN - Input clock frequency. Related signals: REFERENCECLK, EXTFEEDBACK.
f_OUT - Output clock frequency. Related signals: PLLOUT
f_VCO - PLL VCO frequency.
f_PFD - Phase Detector input frequency.

Now, depending on the chosen feedback path (FEEDBACK_PATH), there are two equations with their respective constraints.

FEEDBACK_PATH other than SIMPLE

$\\ \textup{f}_{\textup{PFD}} = \frac{\textup{f}_{\textup{IN}}}{\textup{DIVR} + 1} \\ \\ \textup{f}_{\textup{VCO}} = \textup{f}_{\textup{PFD}} \times (\textup{DIFV} + 1) \times 2^{\textup{DIVQ}} \\ \textup{f}_{\textup{OUT}} = \frac{\textup{f}_{\textup{VCO}}}{2^{\textup{DIVQ}}} = \frac{\textup{f}_{\textup{IN}} \times (\textup{DIVF} + 1)}{2^{\textup{DIVQ}} \times (\textup{DIVR} + 1)} \\ \\ 0 \le \textup{DIVF} \le 63 \\ 0 \le \textup{DIVR} \le 15 \\ 1 \le \textup{DIVQ} \le 6 \\ 10 \le \textup{f}_{\textup{IN}} \le 133 \\ 16 \le \textup{f}_{\textup{OUT}} \le 275 \\ 533 \le \textup{f}_{\textup{VCO}} \le 1066 \\ 10 \le \textup{f}_{\textup{PFD}} \le 133$

FEEDBACK_PATH = SIMPLE

These equations come from the DS1040 - iCE40 LP/HX Family Data Sheet, TN1251 - iCE40 sysCLOCK PLL Design and Usage Guide, and the icepll source code.

GuzTech commented 4 years ago

ECP5 Family

The parameters are:

f_IN - Input clock frequency. Related signals: CLKI, CLKFB.
f_OUT - Output clock frequency. Related signals: CLKOP, CLKOS.
f_VCO - PLL VCO frequency.
f_PFD - Phase Detector input frequency.

$\\ \textup{f}_{\textup{PFD}} = \frac{\textup{f}_{\textup{IN}}}{\textup{CLKI\_DIV}} \\ \\ \textup{f}_{\textup{VCO}} = \textup{f}_{\textup{PFD}} \times \textup{CLKFB\_DIV} \times \textup{CLKO\{P,S,S2,S3\}\_DIV} \\ \textup{f}_{\textup{OUT}} = \frac{\textup{f}_{\textup{VCO}}}{\textup{CLKO\{P,S,S2,S3\}\_DIV}} \\ \\ 1 \le \textup{CLKI\_DIV} \le 128 \\ 1 \le \textup{CLKFB\_DIV} \le 80 \\ 1 \le \textup{CLKOP\_DIV} \le 128 \\ 1 \le \textup{CLKOS\_DIV} \le 128 \\ 1 \le \textup{CLKOS2\_DIV} \le 128 \\ 1 \le \textup{CLKOS3\_DIV} \le 128 \\ 8 \le \textup{f}_{\textup{IN}} \le 400 \\ 3.125 \le \textup{f}_{\textup{OUT}} \le 400 \\ 400 \le \textup{f}_{\textup{VCO}} \le 800 \\ 10 \le \textup{f}_{\textup{PFD}} \le 400$

These equations come from the ECP5 and ECP5-5G Family Data Sheet, TN1263 - ECP5 and ECP5-5G sysCLOCK PLL/DLL Design and Usage Guide, and the ecppll source code.

alanvgreen commented 4 years ago

Looking at the frequency problem, here's a first attempt to document a generalized model of PLLs. I believe (based on the docs from @GuzTech above, and a passing familiarity with the code) that a solver based on this model would handle all of the Lattice PLLs in LiteX's clock.py.

Generalized PLL Model

Xilinx clocking seems to be a little simpler - I'll need to add that to the model. Have not looked hard at other vendors yet.

Please point out problems!

H-S-S-11 commented 4 years ago

Intel (ALTPLL)

Older Altera parts (and Cyclone 10LP for some reason) use the ALTPLL (or ALTPLL_RECONFIG) IP to generate a PLL instantiating altpll. The key parameters are:

pll_type (string) which should probably be "AUTO" in general so Quartus can select which is used (only relevant for stratix, arria parts) operation_mode (string) which selects the feedback mode (default in the wizard is"NORMAL", which compensates for clock network). inclk0_input_frequency (integer) which appears to actually be the period in ps of the input clock. Range varies by device compensate_clock (string) (for normal mode operation) where the feedback comes from, eg CLK0 On a per clock basis: clkn_divide_by (integer) Must be greater than 0, less than 256 clkn_duty_cycle (integer) clkn_multiply_by (integer) Must be greater than 0, less than or equal to 256 clkn_phase_shift (string) The IP wizard can either take these values directly or generate them from input frequency and desired output frequency. Phase shift and duty cycle have a set of allowed values which is dependent on the multiplication factors.

V series instantiates an altera_pll with a slightly simpler set of parameters for a basic pll:

fractional_vco_multipler (string) "true" or "false". Example was done with an integer-N PLL reference_clock_frequency (string) eg "50.0 MHz" operation_mode (string) eg "direct", "normal" number_of_clocks (integer) output_clock_frequencyn (string) eg "100.000000 MHz" phase_shiftn (string) eg "0 ps" duty_cyclen (integer) pll_type (string) Default seems to be "General" which hopefully means auto pll_subtype (string) see above

I think there is a balance deciding how much to make user controllable and how much to default to basic settings to avoid creating something that can't be synthesized, but I've tried to show here the parameters that would be relevant to the majority of use cases.

alanvgreen commented 4 years ago

Thanks @H-S-S-11 for the Altera info. In terms of the basic PLL architecture and frequency calculations, it lines up well with Lattice and Xilinx parts.

I think there is a balance deciding how much to make user controllable and how much to default to basic settings to avoid creating something that can't be synthesized, but I've tried to show here the parameters that would be relevant to the majority of use cases.

Agreed! I think the following should cover most designs, and be included in the initial PLL API.

Multiple clock outputs per PLL
Phase shift (where supported)
Enable/Disable PLL

For a basic PLL API, I don't think we need features such as

Duty cycle
Dynamic control of frequency
Dynamic control of phase

Thoughts?

whitequark commented 4 years ago

I think this is a reasonable plan.

rroohhh commented 4 years ago

This is probably obvious, but access to the locked signal would be quite useful aswell.

pbsds commented 4 years ago

Should the interface require you to specify the tolerance for error? The integer solutions for the PLL parameters will cause some amount of error

whitequark commented 4 years ago

Should the interface require you to specify the tolerance for error?

Yes, since the constraint solver would have to take this into account anyway.

alanvgreen commented 4 years ago

Here's an outline of a solver: https://github.com/alanvgreen/nmigen/blob/pll/experiments/PLLSolver_pynb.ipynb

It's in Jupyter notebook form so, please mess around with it.

At this stage, I'd like feedback on the FrequencySolver API.

The shape of the solver is based on algorithms found in migen's clock.py. In clock.py, though the frequency problem and routing problem solutions are closely tied.

whitequark commented 4 years ago

The shape of the solver is based on algorithms found in migen's clock.py.

That's not migen, that's litex.

whitequark commented 4 years ago

It's in Jupyter notebook form so, please mess around with it.

Just reading the first two lines: this uses a different license from nMigen, is that intentional?

alanvgreen commented 4 years ago

Err, no it wasn't intentional. Fixed now.

whitequark commented 4 years ago

I took a look. This is just a brute-force solver, right? If that works sufficiently well for most applications I'm perfectly happy with it.

pbsds commented 4 years ago

If a brute-force solution proves to be too costly, we could optionally use a LP solver like PuLP or scipy.optimize.linprog, or perhaps implement a simple LP solver like the Simplex algorithm.

alanvgreen commented 4 years ago

The "brute-force" solver is probably fast enough for most situations. Due to the allowed ranges of input and VCO frequencies, large parts of the search space are trimmed quite quickly. And it seems to be working for litex, too.

I'll keep going with this, then.

whitequark commented 4 years ago

If it works for LiteX I think it's safe to assume it's good enough.

ECP5-PCIe commented 2 years ago

We should also consider additional parameters of the PLL which don't directly relate to the PLL frequency, for example filter settings, since they affect the output jitter and settling time. Though I am not sure how to approach it, maybe based on what the manufacturer toolchain outputs or based on empirical experiments or only user specified.

Additionally some mechanism to change the constraints for a specific design could be useful, for example allowing higher VCO frequencies than specified in the datasheet.

amaranth-lang / amaranth