Open GuzTech opened 4 years ago
I'd rather think that the ECP5 would be a better starting point since there is only one PLL type to take care of. I'm not sure if it has completely been reverse engineered though.
Fair enough, I'll see if I can whip up something for the ECP5 and share it here.
I have an initial implementation in #426.
Here are some of my thoughts on how a PLL could be used:
As I see it, there are three ways of creating a PLL.
SB_PLL40_2F_CORE
).iCE40PLL(SB_PLL40_PAD)
).The first one is basically a class that instantiates the specific PLL primitive, with some logic that calculates the PLL parameters. I think this leads to repeated code, since most primitives differ slightly among themselves.
The second one somewhat abstracts away the specifics of each PLL primitive and leads to code reuse.
The third one looks like it would require quite a bit of code. Who decides what the "correct" PLL primitive is? For example, for the iCE40 family, the usage of the _CORE
and _PAD
primitives is limited by which bank the input clock originates from. We would have to be able to check this, and I don't think this is the responsibility of nMigen.
In Litex, you create a PLL object and call register_clkin
on it, where you give the input clock signal and it's frequency. For each output clock, you first create a ClockDomain, and call create_clkout
on the PLL object and supply the ClockDomain. Finally, you add a period constraint to the Platform.
I like how this works, but of course I'm open to suggestions. It might not make sense for the iCE40 PLL primitives with one clock output, but it would be consistent.
Also, maybe we should supply the Platform when creating the PLL object, so that when we create an output clock, it would automatically add the corresponding period constraint so that the user doesn't forget it. But maybe this is not necessary.
One more thing that might be interesting to think about (but not sure if this actually fits this issue) is where the PLL gets stored. Given there is usually only a small number of them, it almost feels like putting them into the platform could be a good idea. It would make it easier to generate the required clocks for different parts of a design without manually passing a PLL instance around or having to create all clocks toplevel. Of course it also has downsides, like making phase relations less obvious.
Another thing (atleast on xilinx 7series, not sure if there are similar things on other platforms) is the placement of the PLL, if it is in the same clocking region as say the pin the input clock comes into the fpga from, one only needs a regional clock buffer for the input clock, if it is not one needs a global buffer. This is probably far out (and maybe out of scope), but some support from nmigen to help with choosing the correct clock buffer / automatically finding a good combination would be pretty interesting.
Okay, so I think we have two basic problems here that are essentially separate:
It seems to me that it's worthwhile to start with the frequency problem. Do you think you can start collecting the info about PLLs and expressing it in the form of Diophantine equations? Then we'll need to figure out some way to solve them other than brute force, I'm sure there should be libraries that help with that.
Also, maybe we should supply the Platform when creating the PLL object, so that when we create an output clock, it would automatically add the corresponding period constraint so that the user doesn't forget it. But maybe this is not necessary.
I think the PLL should definitely create the clock constraints and given that the platform is already passed to elaborate it should be fairly easy to do that there.
I think the PLL should definitely create the clock constraints
This isn't usually necessary because the backend toolchain already knows the output frequency if you constrained the inputs. But in any case, the platform is the factory of PLLs, so a PLL naturally knows what the platform is.
Oh whoops somehow that slipped my mind.
It seems to me that it's worthwhile to start with the frequency problem. Do you think you can start collecting the info about PLLs and expressing it in the form of Diophantine equations?
These are the polynomial equations that describe the relationship between the input and output frequencies, and the multiplication and division (integer) values, right?
Then we'll need to figure out some way to solve them other than brute force, I'm sure there should be libraries that help with that.
Is brute forcing an actual issue? It might just be me, but it's not like Python chokes on calculating the parameters every time I build my design. Maybe I'm overlooking something?
These are the polynomial equations that describe the relationship between the input and output frequencies, and the multiplication and division (integer) values, right?
Yes, with integer solutions, and some inequalities too, since none of the parameters have an arbitrary range.
Is brute forcing an actual issue?
We can start with brute forcing and improve it later, assuming that our data representation doesn't prevent us from doing so. (E.g. expressing the equations with arbitrary Python code obviously won't work.)
Ok, I'll look up the formulas from the datasheets for each PLL. The data representation should express the equations symbolically using a symbolic expression library I assume.
The data representation should express the equations symbolically using a symbolic expression library I assume.
Something like that. It's not clear if we can use an existing library or if we should make yet another tiny sub-language for this--I'm leaning towards the latter since historically nMigen has been very conservative with the dependencies and lately it became more conservative, not less (126f0be731ab008657ca1c4fc05ce7cc6e355169).
For now I actually think it makes sense to collect PLL equations in this issue in human-readable form while the rest of the design is being worked on; the data entry part is the most laborous one anyhow, and converting them to the format accepted by whatever CAS we end up using is easy.
Yes, I agree. To a user, it shouldn't matter how the parameters are being resolved, as long as the specification of input/output frequencies/phases is designed correctly.
Is the final API meant to be for deriving different digital clocks are does one want to be more general ? In latter case other constraints than frequency - like jitter, duty ratio - may become important. For example high-speed, high-precision ADCs will need clocks with very low jitter. Other example is I think the litex DDR interface that needs two clocks of same (high) frequency but with 90 deg phase offset (if I remember correctly).
I would want it to be as general as much as it makes sense. I think most of the issues you mentioned can be solved if we can specify the location of the PLLs. In the DDR example, the user can use the same PLL to generate the two clocks, and use the two different clock domains as needed. Maybe there is a way to specify the phase difference as a constraint?
I'm not sure how specifying the location would help? Having two clocks with a phase offset however should be easily expressible via the same approach of using equations, just for phase and not frequency.
Yeah, you're right. I got confused :)
Output clocks are routed through dedicated clock networks, so it doesn't really matter. I would matter if they were to be routed through the fabric, but that's not recommended and you have specify that by hand for some vendors.
But I don't think it would be bad if you could specify its location.
Both the LP and HX parts have the same constraints when it comes to PLL parameters. The parameters are:
REFERENCECLK
, EXTFEEDBACK
.PLLOUT
Now, depending on the chosen feedback path (FEEDBACK_PATH
), there are two equations with their respective constraints.
These equations come from the DS1040 - iCE40 LP/HX Family Data Sheet
, TN1251 - iCE40 sysCLOCK PLL Design and Usage Guide
, and the icepll source code.
The parameters are:
CLKI
, CLKFB
.CLKOP
, CLKOS
.These equations come from the ECP5 and ECP5-5G Family Data Sheet
, TN1263 - ECP5 and ECP5-5G sysCLOCK PLL/DLL Design and Usage Guide
, and the ecppll source code.
Looking at the frequency problem, here's a first attempt to document a generalized model of PLLs. I believe (based on the docs from @GuzTech above, and a passing familiarity with the code) that a solver based on this model would handle all of the Lattice PLLs in LiteX's clock.py.
Xilinx clocking seems to be a little simpler - I'll need to add that to the model. Have not looked hard at other vendors yet.
Please point out problems!
Intel (ALTPLL)
Older Altera parts (and Cyclone 10LP for some reason) use the ALTPLL (or ALTPLL_RECONFIG) IP to generate a PLL instantiating altpll. The key parameters are:
pll_type (string) which should probably be "AUTO" in general so Quartus can select which is used (only relevant for stratix, arria parts) operation_mode (string) which selects the feedback mode (default in the wizard is"NORMAL", which compensates for clock network). inclk0_input_frequency (integer) which appears to actually be the period in ps of the input clock. Range varies by device compensate_clock (string) (for normal mode operation) where the feedback comes from, eg CLK0 On a per clock basis: clkn_divide_by (integer) Must be greater than 0, less than 256 clkn_duty_cycle (integer) clkn_multiply_by (integer) Must be greater than 0, less than or equal to 256 clkn_phase_shift (string) The IP wizard can either take these values directly or generate them from input frequency and desired output frequency. Phase shift and duty cycle have a set of allowed values which is dependent on the multiplication factors.
V series instantiates an altera_pll with a slightly simpler set of parameters for a basic pll:
fractional_vco_multipler (string) "true" or "false". Example was done with an integer-N PLL reference_clock_frequency (string) eg "50.0 MHz" operation_mode (string) eg "direct", "normal" number_of_clocks (integer) output_clock_frequencyn (string) eg "100.000000 MHz" phase_shiftn (string) eg "0 ps" duty_cyclen (integer) pll_type (string) Default seems to be "General" which hopefully means auto pll_subtype (string) see above
I think there is a balance deciding how much to make user controllable and how much to default to basic settings to avoid creating something that can't be synthesized, but I've tried to show here the parameters that would be relevant to the majority of use cases.
Thanks @H-S-S-11 for the Altera info. In terms of the basic PLL architecture and frequency calculations, it lines up well with Lattice and Xilinx parts.
I think there is a balance deciding how much to make user controllable and how much to default to basic settings to avoid creating something that can't be synthesized, but I've tried to show here the parameters that would be relevant to the majority of use cases.
Agreed! I think the following should cover most designs, and be included in the initial PLL API.
For a basic PLL API, I don't think we need features such as
Thoughts?
I think this is a reasonable plan.
This is probably obvious, but access to the locked
signal would be quite useful aswell.
Should the interface require you to specify the tolerance for error? The integer solutions for the PLL parameters will cause some amount of error
Should the interface require you to specify the tolerance for error?
Yes, since the constraint solver would have to take this into account anyway.
Here's an outline of a solver: https://github.com/alanvgreen/nmigen/blob/pll/experiments/PLLSolver_pynb.ipynb
It's in Jupyter notebook form so, please mess around with it.
At this stage, I'd like feedback on the FrequencySolver API.
The shape of the solver is based on algorithms found in migen's clock.py. In clock.py, though the frequency problem and routing problem solutions are closely tied.
The shape of the solver is based on algorithms found in migen's clock.py.
That's not migen, that's litex.
It's in Jupyter notebook form so, please mess around with it.
Just reading the first two lines: this uses a different license from nMigen, is that intentional?
Err, no it wasn't intentional. Fixed now.
I took a look. This is just a brute-force solver, right? If that works sufficiently well for most applications I'm perfectly happy with it.
If a brute-force solution proves to be too costly, we could optionally use a LP solver like PuLP or scipy.optimize.linprog, or perhaps implement a simple LP solver like the Simplex algorithm.
The "brute-force" solver is probably fast enough for most situations. Due to the allowed ranges of input and VCO frequencies, large parts of the search space are trimmed quite quickly. And it seems to be working for litex, too.
I'll keep going with this, then.
If it works for LiteX I think it's safe to assume it's good enough.
We should also consider additional parameters of the PLL which don't directly relate to the PLL frequency, for example filter settings, since they affect the output jitter and settling time. Though I am not sure how to approach it, maybe based on what the manufacturer toolchain outputs or based on empirical experiments or only user specified.
Additionally some mechanism to change the constraints for a specific design could be useful, for example allowing higher VCO frequencies than specified in the datasheet.
The goal of this issue is to document the kinds of PLL primitives of the various FPGA manufacturers, so that we can support them in nmigen.
Lattice
Lattice PLLs are different for the different FPGA families.
iCE40 (LP/LM/HX/Ultra/UltraLite/UltraPlus)
The iCE40 family has five types of PLLs. They are all capable of shifting the output clock by 0 and 90 degrees, and allow fine delay adjustments of up to 2.5 ns (typical) in 150 ps increments (typical).
SB_PLL40_CORE
Can be used if the source clock of the PLL originates on the FPGA or is driven by an input pad that is not the bottom IO bank (bank 2).
SB_PLL40_PAD
Can be used if the source clock of the PLL is driven by an input pad that is located in the bottom or top IO bank (banks 2 and 0 respectively). When using this PLL, the source clock cannot be used anymore.
SB_PLL40_2_PAD
Can be used if the source clock of the PLL is driven by an input pad that is located in the bottom or top IO bank (banks 2 and 0 respectively). This PLL outputs the requested clock as well as the the source clock.
SB_PLL40_2F_CORE
Can generate two different output frequencies. Can be used if the source clock of the PLL originates on the FPGA.
SB_PLL40_2F_PAD
Can generate two different output frequencies. Can be used if the source clock of the PLL is driven by an input pad that is located in the bottom or top IO bank (banks 2 and 0 respectively).
ECP5
The ECP5 family has bunch of clocking elements, but only one type of PLL.
EHXPLLL
It has four outputs, but if the user wants hook the PLL feedback, then one of the outputs cannot be used in the fabric. It supports dynamic clock selection and control, dynamic phase adjustment, etc.
Xilinx
Xilinx has two clock synthesis IPs as far as I know: PLL and Mixed-Mode Clock Manager (MMCM). The different FPGA families have different versions of these IPs.
Spartan-6
The Spartan-6 family have Clock Management Tiles (CMTs), where each contain one PLL and two Digital Clock Managers (DCMs). The latter can be used to implement Delay Locked Loops, digital frequency synthesizers, digital phase shifters, or a digital spread spectrum.
There are two PLL primitives, that each have six clock outputs and a dedicated feedback output.
PLL_BASE
PLL_BASE provides access to the most commonly used features, such as clock deskewing, frequency synthesis, coarse phase shifting, and duty cycle programming.
PLL_ADV
PLL_ADV provides access to all PLL_BASE features, such as dynamically reconfiguring the PLL.
7 Series
The 7 series also have CMTs, where each contains a MMCM and a PLL. The PLL contains a subset of the functions of the MMCM. The PLL has six clock outputs, whereas the MMCM has seven clock outputs. Both have a dedicated feedback output.
MMCME2_BASE and PLLE2_BASE
Both
BASE
IPs proved access to the most commonly used features, such as clock deskewing, frequency synthesis, coarse phase shifting, and duty cycle programming.MMCME2_ADV and PLLE2_ADV
The
MMCME2_ADV
IP provides access to allBASE
features, such as additional ports for clock switching, access to the Dynamic Reconfiguration Port (DRP), and dynamic fine-phase shifting. ThePPLE2_ADV
IP provides the same features, except for dynamic fine-phase shifting.UltraScale / UltraScale+
The CMTs in the Ultrascale family contain an MMCM and two PLLs. Just like with the 7 series FPGAs, there are
_BASE
and_ADV
IPs so they won't be repeated here.MMCM
The MMCMs have seven clock outputs and a dedicate feedback output. The MMCM IPs for the UltraScale are
MMCME3_BASE
andMMCME3_ADV
, whereas for the UltraScale+ they areMMCME4_BASE
andMMCME4_ADV
.PLL
The PLLs have two clock outputs each. Similarly to the MMCMs, the PLL IPs are
PLLE3_BASE
andPLLE3_ADV
for the UltraScale family, andPLLE4_BASE
andPLLE4_ADV
for the UltraScale+ family.Intel
From what I can gather, all Intel FPGA PLLs are instantiated using the
ALTPLL
IP, and all PLLs have five clock outputs.Common Remarks
PLLs multiply and divide clocks to achieve the desired output clock frequencies. But they are limited to the allowed minimum and maximum mulitplier, divider, and intermediate clock frequencies. These values can also depend on the speedgrade of the FPGA, which would mean that we should be able to supply the speedgrade of the FPGA.
The calculation of how to achieve the desired clock frequencies given the input clock frequencies is different.
Most PLLs have additional functionalities that we could initially not support, and slowly introduce one by one. Also, there are other IPs that are often used in conjunction with PLLs, so they could be added too.
For Xilinx clocking resources, we could just always use the
_ADV
IPs and just supply default values for addition feartures that are not used.The information above can of course contain errors :)
Approach
Maybe it would be a good idea to just start with the iCE40 family (since its the simplest, and I use it my current design). We could implement the PLLs similarly to how Litex does it.