Closed mwkmwkmwk closed 2 years ago
Better handling of transparency would also be something I'd like to see, needed to support most TDP patterns - c.f. #1390 and nmigen/nmigen#216
Not in scope:
$dff*
into the cellIt should handle multiple write ports without giving up and falling back to flops.
It should handle multiple write ports without giving up and falling back to flops.
I mean, uh, that's kind of obvious :p
A bunch of related issues that should be fixed:
I've done a bit of digging on the TDP inference, and managed to get a prototype mapping going through the Python API, but got sidetracked while working on test scenarios. So, not sure if this helps, but referencing to the 'ghdl hemipshere':
https://github.com/ghdl/ghdl/issues/1069
The MyHDL test cases are outdated, I'll have to revisit/CI them again..
From an RTLIL (post-inference) PoV:
Currently, memories seem to have two possible representations: Either as a combination of an RTLIL::Memory
and zero or more of each $memrd
, $memwr
, and $meminit
, or as a single $mem
cell. The former could be used to represent most of the requirements outlined here right now, while $mem
falls completely flat for asymmetric memories (all read address, read data, write address, and write data ports have to be the same length, respectively, simply because of how the cell's inputs and outputs are structured). So, first of all, what should happen to $mem
?
$mem
and the split cells in all consumers).$mem
).I'm looking forward to input, including options I may have overlooked - especially from the devs who ultimately have the final say in this :)
We could also support arbitrarily many independent ports by basically concatenating them in $mem
interface ports (see what $macc
cell is doing), though I don't consider it a particularly clean approach.
@clairexen Is it really not an option to replace $mem
with fine grained $mem*
cells? One problem this would solve is formal verification of indirect memory accesses of the form mem[mem[x]]
which currently has no good solution.
@clairexen any update on this?
Is it really not an option to replace $mem with fine grained $mem* cells?
@whitequark Sorry, I don't understand what "fine grained $mem* cells" means in this context. Can you elaborate?
Can you elaborate?
I meant, can we get rid of $mem
completely and instead use only $memrd
, $memwr
and $meminit
? The only problem created by this change (that I'm aware of) that couldn't be fixed within Yosys is simulating structural Verilog netlists, but I think we can just put the logic that currently lives in the $mem
simulation model in write_verilog
itself.
After some great discussion on IRC yesterday, I thought I'd add my memory cell redesign draft here as well: https://gist.github.com/Xiretza/fec08b9810a66efd368705a7f4a51aa7
Executive summary (of revision 2):
$mem
$memrd
$memrdwr
cell for a combined read+write port with shared clock and address; read and write port have the same dimensions\TRANSPARENT
attribute from $memrd
and instead added a \TRANSPARENCY_GROUP
attribute to all cells, allowing transparency between arbitrary sets of read+write portsPlease let me know if you can think of any requirements that can't be modeled easily with these proposed cells.
Removing $mem
appears a bit radical to me, I believe you could squeeze all asymmetric issues into it (by using extra parameters) while staying backwards compatible. Apart from that, $mem
should probably remain for a while just for the (tech agnostic) simulation model that some might depend on.
Next, I am not sure whether asymmetrical properties should end up in a (virtual) primitive, most BRAM primitives I am aware of don't have such a built-in property and infer with glue logic to implement asymmetric behaviour.
For what I've eperimented with, TDP problems turned up mostly during technology mapping. The above mentioned mapping prototype was revisited, test case finds in this repo:
https://github.com/hackfin/hdlplayground
(quick start: fire up binder, open up tests/memory.ipynb
). There are modified python scripts from the yosys techlibs used (/lib/..
) to enable the mapping for TDP capabilities (A1DATA -> [ A1READ, A1WRITE]
) . The custom mapping procedure is in ram_mapper.py
. To verify using post-techmap-simulation, you need to install the ECP5 DP16KD.v
vendor model manually as described in index.ipynb
.
There are also a few other 'read-after-write/writethrough' test scenarios, specific to VHDL (but you could emit Verilog as well and run that through the Cosimulation).
I fear though that there is no (and never will be) full consensus on a totally portable RAM description in any HDL that infers 'just right' on all known synthesis tools. So if there will be yosys-specific design-rules on how to describe TDP-RAMs to infer correctly, no prob, as long as they tech-map accordingly).
More comments:
$memrdwr
proposal: Can we not just derive the sharedness from collecting $memrd
and $memwr
?\TRANSPARENCY_GROUP
: I'd second that.Removing
$mem
appears a bit radical to me, I believe you could squeeze all asymmetric issues into it (by using extra parameters) while staying backwards compatible.
The point of removing $mem
is specifically not having to shoehorn all the asymmetric (and transparency group) parameters into a cell not suited for them.
Apart from that,
$mem
should probably remain for a while just for the (tech agnostic) simulation model that some might depend on.
If write_verilog
can understand $memrd
and $memwr
and such cells directly then the simulation model becomes unnecessary.
Some thoughts after doing a lot of reading:
We need to make a few upgrades to the memory cells, but not that many:
The $memrd
cell in the synchronous version needs support for output register initialization and synchronous/asynchronous reset. I propose:
INIT_VALUE
parameter with the initial value (all-x means uninitialized)RST_VALUE
parameter with the reset valueSRST
and ARST
single-bit ports for synchronous and asynchronous reset and corresponding SRST_POLARITY
and ARST_POLARITY
parameters (if a reset is unused, it should be tied to 0 and polarity set to 1; in the initial version, only one of the two resets can be usedCE_OVER_SRST
boolean parameter to select priority of SRST vs CEThe $memrd
and $memwr
cells need support for mixed widths. I propose allowing a port to have a width that is a power-of-two multiple of the memory's width (with the address being appropriately shifted), without adding any new parameters for this purpose. Since this is commonly described in Verilog as a loop over the low bits of the narrow memory's address (at least Xilinx infers wide ports from this pattern), I propose adding logic to memory_share
pass that would recognize ports sharing high bits of the address and having constants in the low bits, merging them to a larger port. This should cover the inference side.
I propose not adding a $memrdwr
cell — having thought of it, it seems to be more trouble than it's worth, and the memory port sharing should rather be decided in the memory_bram
replacement that has more information about shareability (for example, on Xilinx, read and write ports sharing an address cannot be merged into a read-write port if they have completely independent enables — only "read exclusive with write" and "write implies read" can actually be realized).
I propose removing the current TRANSPARENT
parameter and instead being able to mark read ports as transparent with an arbitrary subset of write ports. To this end, I propose adding a unique sequential identifier parameter ID
for write ports, and adding a new TRANSPARENT
parameter to read ports that is a bitmask of write port IDs that this port should be transparent with. I also propose adding a new pass that would recognize soft transparency logic on read ports and convert it to the appropriate TRANSPARENT
bit.
Analogously to transparency, I propose replacing the current PRIORITY
parameter on write ports with a bitmask of which other write ports this write port has priority over, thus changing a total ordering into a partial ordering, better able to represent write ports from disjoint processes.
I propose removing the $mem
cell and converting all passes to use the unpacked cells directly. While it'd be possible to update the $mem
cell with the above improvements as well, the unpacked cells are just as easy to deal with in passes, and maintaining two sets of cells is more trouble than it's worth.
I propose the following changes to the flow, other than the obvious updates to support the above functionality:
opt_mem
, upgrade opt_clean
to do its job instead.memory_transparent
? or maybe a part of memory_dff
?) to recognize transparency implemented in soft logic
and set the TRANSPARENT
bit.memory_share
for recognizing wide ports — eg. when there are read ports differing only in low address bits (that are constant), merge them into one wide read port.memory_bram
: do a complete rewrite, to be designed later. The basic idea stays the same (slurp a text file describing various blockrams, pick the best type that matches, emit temp cells that will be converted to actual vendor cells via techmap
), but a lot of new capabilities need to be added, and the text format will probably be all-new.I propose removing the $mem cell and converting all passes to use the unpacked cells directly. While it'd be possible to update the $mem cell with the above improvements as well, the unpacked cells are just as easy to deal with in passes, and maintaining two sets of cells is more trouble than it's worth.
It's still not entirely clear to me how $memrd
and $memwr
would infer to a simulateable model (post-synthesis), unless running an architecture specific (post-map) simulation with the mapped vendor primitives. As of now, I don't see how two distinct modules would share the actual memory buffer variable except by changing the interface. Could you elaborate on that?
Question is, if $mem
could be upgraded so far to be the 'cover-all-scenarios' abstract primitive to work for all BRAM architectures and map into a set of architecture-agnostic blocks like TRELLIS_DPR16X4
LUT RAMs.
This would also make creation of libraries for new architectures easier (it's better to touch *.v
than *.cc
, IMHO).
There might be a few users still depending on $mem
as of now, including me.
I propose removing the $mem cell and converting all passes to use the unpacked cells directly. While it'd be possible to update the $mem cell with the above improvements as well, the unpacked cells are just as easy to deal with in passes, and maintaining two sets of cells is more trouble than it's worth.
It's still not entirely clear to me how
$memrd
and$memwr
would infer to a simulateable model (post-synthesis), unless running an architecture specific (post-map) simulation with the mapped vendor primitives. As of now, I don't see how two distinct modules would share the actual memory buffer variable except by changing the interface. Could you elaborate on that?
This has been said already — by not having a simulateable model, and handling $mem*
in write_verilog
instead to convert them back to Verilog code, so that they never actually appear in the output netlist.
Question is, if
$mem
could be upgraded so far to be the 'cover-all-scenarios' abstract primitive to work for all BRAM architectures and map into a set of architecture-agnostic blocks likeTRELLIS_DPR16X4
LUT RAMs. This would also make creation of libraries for new architectures easier (it's better to touch*.v
than*.cc
, IMHO).
It could, but why would we ever want to do it specifically using $mem
if we can have just one memory representation ($mem*
separate cells) instead of two?
There might be a few users still depending on
$mem
as of now, including me.
If you're depending on the raw cell in some way, you'd better describe how instead of just saying it.
This has been said already — by not having a simulateable model, and handling $mem* in write_verilog instead to convert them back to Verilog code, so that they never actually appear in the output netlist.
Yes, and how would other languages be handled? This would implicitely output a different structure than the yosys kernel is aware of, unless run through a pass, no? So you'd offload the problem of conversion to the language output plugins.
If you're depending on the raw cell in some way, you'd better describe how instead of just saying it.
I think I've given quite a few examples on that subject (above links), incl. hints on how to handle TDP for the ECP5 case with minor changes to the *.txt
.
Not sure if I'm still up to date, but the GHDL plugin also uses '$mem'.
So this would introduce a compatibility break at some point, how are you going to handle this?
Interfacing towards synthesis is not so much of an issue, if you want to offload $mem[rd|wr]
intermediate mapping to language output. But this doesn't look "nice", really.
This has been said already — by not having a simulateable model, and handling $mem* in write_verilog instead to convert them back to Verilog code, so that they never actually appear in the output netlist.
Yes, and how would other languages be handled? This would implicitely output a different structure than the yosys kernel is aware of, unless run through a pass, no? So you'd offload the problem of conversion to the language output plugins.
Correct, and so what? The language output handles this already anyway. The current Verilog backend does not emit the $mem
cell raw into the output anyway, it's converted to an actual Verilog memory.
If you're depending on the raw cell in some way, you'd better describe how instead of just saying it.
I think I've given quite a few examples on that subject (above links), incl. hints on how to handle TDP for the ECP5 case with minor changes to the
*.txt
.
What, your one-off TDP scripts? Given that the point of this issue is to handle this properly within yosys core, I don't think keeping them working is worth it.
Not sure if I'm still up to date, but the GHDL plugin also uses '$mem'.
... this is pretty much a GHDL bug that needs to be fixed, given that the current memory optimization passes operate only on the unpacked cells anyway (memory_share
specifically) and thus GHDL output doesn't trigger those.
So this would introduce a compatibility break at some point, how are you going to handle this?
I'm not — RTLIL, including the cell primitives, is not and has never been considered a stable interface. Perhaps it should be, but at the moment we really don't have the means to do that (and this is, btw, independent of the $mem
removal discussion — adding new parameters to $memrd
cell is already causing compatibility problems, let alone the TRANSPARENT
changes). I'll probably make a GHDL plugin patch to fix the fallout, but that's about it.
Interfacing towards synthesis is not so much of an issue, if you want to offload
$mem[rd|wr]
intermediate mapping to language output. But this doesn't look "nice", really.
I'm not offloading anything — the $mem
cell already has special handling in the Verilog frontend. What I want to do is to get rid of two needlessly redundant representations of the same thing. Perhaps I could be convinced to go the other way around and kill $memwr/$memrd/$meminit
instead, but one of the two definitely needs to go.
What, your one-off TDP scripts? Given that the point of this issue is to handle this properly within yosys core, I don't think keeping them working is worth it.
See also https://github.com/ghdl/ghdl/issues/1069. And I tend to keep them working (as a reference) until I see these test cases inferring like in other $-Tools.
I'm not — RTLIL, including the cell primitives, is not and has never been considered a stable interface.
The yosys LaTeX docs kinda insinuates so. I have no problem with that, if there's an entry point documenting changes. But that's another topic indeed. Compatibility breaks are kinda painful when introduced suddenly without a reference test that demonstrates the 'new' way.
What I want to do is to get rid of two needlessly redundant representations of the same thing. Perhaps I could be convinced to go the other way around and kill $memwr/$memrd/$meminit instead, but one of the two definitely needs to go.
I fully agree on reducing maintenance work. I just found it easier to map and verify a single pseudo primitive instead of different ones that need to get collected and sorted out accordingly. So you're saying that collecting this information from opt
passes, techmapping and language backends (I'm aware of $mem handling) takes less maintenance than conversion into an abstracted intermediate (complex) $memx
splitting into a *v.
library? Happy to check out at a prototype branch for demonstration.
Here goes a proposal for the new text format description, along with some examples: https://gist.github.com/mwkmwkmwk/2e51a10ba55bda239ac1b667ca062693
Open issues:
depthcascade
capability that would pass over this job to techmap, add a parameter saying how many times to duplicate depthwise; maybe also widthcascade
for RAMs with usable control-only cascades like Versal UltraRAM). The current proposal contains wrcs
which is usable for the simplistic CS logic on ECP5 and derived blockrams, but describing something like Xilinx cascade ports seems impossible in a generic mannerSo, since people have been asking about this, here's the current status:
First, the memory model changes and recognition passes for newly-supported features have all landed. This includes:
memory_dff
) that should be able to recognize reasonable ways of writing transparent ports in VerilogHOWEVER, our current BRAM mapping pass (memory_bram
) is unable to make use of most of the above improvements — it will either emulate the newly-added functionality instead of realizing it in hardware, or completely reject memories using the newly added features (however, it will not reject anything it would have accepted before). This means very little of the above is actually usable now.
While some new functionality (init/reset values, specifically) could be reasonably retrofitted into memory_bram
, in general its architecture is just not powerful enough for some features we really want (single-port memory, true dual-port memory, wide ports), so the plan is to instead spend the resources on writing a more flexible replacement (using the design detailed above), which will complete this saga and close a whole lot of long-standing issues in one big swoop. The timeframe for this is 1-2 months from now.
This is excellent work @mwkmwkmwk and I want to say a huge thank you for taking on this epic task!
A very, very untested solution is now available as #3189.
Requirements: