YosysHQ / yosys

Yosys Open SYnthesis Suite
https://yosyshq.net/yosys/
ISC License
3.45k stars 881 forks source link

Memory inference redesign #1959

Closed mwkmwkmwk closed 2 years ago

mwkmwkmwk commented 4 years ago

Requirements:

daveshah1 commented 4 years ago

Better handling of transparency would also be something I'd like to see, needed to support most TDP patterns - c.f. #1390 and nmigen/nmigen#216

mwkmwkmwk commented 4 years ago

Not in scope:

Ravenslofty commented 4 years ago

It should handle multiple write ports without giving up and falling back to flops.

mwkmwkmwk commented 4 years ago

It should handle multiple write ports without giving up and falling back to flops.

I mean, uh, that's kind of obvious :p

mwkmwkmwk commented 4 years ago

A bunch of related issues that should be fixed:

hackfin commented 4 years ago

I've done a bit of digging on the TDP inference, and managed to get a prototype mapping going through the Python API, but got sidetracked while working on test scenarios. So, not sure if this helps, but referencing to the 'ghdl hemipshere':

https://github.com/ghdl/ghdl/issues/1069

The MyHDL test cases are outdated, I'll have to revisit/CI them again..

Xiretza commented 4 years ago

From an RTLIL (post-inference) PoV:

Currently, memories seem to have two possible representations: Either as a combination of an RTLIL::Memory and zero or more of each $memrd, $memwr, and $meminit, or as a single $mem cell. The former could be used to represent most of the requirements outlined here right now, while $mem falls completely flat for asymmetric memories (all read address, read data, write address, and write data ports have to be the same length, respectively, simply because of how the cell's inputs and outputs are structured). So, first of all, what should happen to $mem?

I'm looking forward to input, including options I may have overlooked - especially from the devs who ultimately have the final say in this :)

mwkmwkmwk commented 4 years ago

We could also support arbitrarily many independent ports by basically concatenating them in $mem interface ports (see what $macc cell is doing), though I don't consider it a particularly clean approach.

whitequark commented 4 years ago

@clairexen Is it really not an option to replace $mem with fine grained $mem* cells? One problem this would solve is formal verification of indirect memory accesses of the form mem[mem[x]] which currently has no good solution.

Xiretza commented 4 years ago

@clairexen any update on this?

clairexen commented 4 years ago

Is it really not an option to replace $mem with fine grained $mem* cells?

@whitequark Sorry, I don't understand what "fine grained $mem* cells" means in this context. Can you elaborate?

whitequark commented 4 years ago

Can you elaborate?

I meant, can we get rid of $mem completely and instead use only $memrd, $memwr and $meminit? The only problem created by this change (that I'm aware of) that couldn't be fixed within Yosys is simulating structural Verilog netlists, but I think we can just put the logic that currently lives in the $mem simulation model in write_verilog itself.

Xiretza commented 4 years ago

After some great discussion on IRC yesterday, I thought I'd add my memory cell redesign draft here as well: https://gist.github.com/Xiretza/fec08b9810a66efd368705a7f4a51aa7

Executive summary (of revision 2):

Please let me know if you can think of any requirements that can't be modeled easily with these proposed cells.

hackfin commented 4 years ago

Removing $mem appears a bit radical to me, I believe you could squeeze all asymmetric issues into it (by using extra parameters) while staying backwards compatible. Apart from that, $mem should probably remain for a while just for the (tech agnostic) simulation model that some might depend on.

Next, I am not sure whether asymmetrical properties should end up in a (virtual) primitive, most BRAM primitives I am aware of don't have such a built-in property and infer with glue logic to implement asymmetric behaviour.

For what I've eperimented with, TDP problems turned up mostly during technology mapping. The above mentioned mapping prototype was revisited, test case finds in this repo:

https://github.com/hackfin/hdlplayground

(quick start: fire up binder, open up tests/memory.ipynb). There are modified python scripts from the yosys techlibs used (/lib/..) to enable the mapping for TDP capabilities (A1DATA -> [ A1READ, A1WRITE]) . The custom mapping procedure is in ram_mapper.py. To verify using post-techmap-simulation, you need to install the ECP5 DP16KD.v vendor model manually as described in index.ipynb. There are also a few other 'read-after-write/writethrough' test scenarios, specific to VHDL (but you could emit Verilog as well and run that through the Cosimulation). I fear though that there is no (and never will be) full consensus on a totally portable RAM description in any HDL that infers 'just right' on all known synthesis tools. So if there will be yosys-specific design-rules on how to describe TDP-RAMs to infer correctly, no prob, as long as they tech-map accordingly).

More comments:

whitequark commented 4 years ago

Removing $mem appears a bit radical to me, I believe you could squeeze all asymmetric issues into it (by using extra parameters) while staying backwards compatible.

The point of removing $mem is specifically not having to shoehorn all the asymmetric (and transparency group) parameters into a cell not suited for them.

Apart from that, $mem should probably remain for a while just for the (tech agnostic) simulation model that some might depend on.

If write_verilog can understand $memrd and $memwr and such cells directly then the simulation model becomes unnecessary.

mwkmwkmwk commented 4 years ago

Some thoughts after doing a lot of reading:

  1. We need to make a few upgrades to the memory cells, but not that many:

    1. The $memrd cell in the synchronous version needs support for output register initialization and synchronous/asynchronous reset. I propose:

      • Adding new INIT_VALUE parameter with the initial value (all-x means uninitialized)
      • Adding new RST_VALUE parameter with the reset value
      • Adding new SRST and ARST single-bit ports for synchronous and asynchronous reset and corresponding SRST_POLARITY and ARST_POLARITY parameters (if a reset is unused, it should be tied to 0 and polarity set to 1; in the initial version, only one of the two resets can be used
      • Adding new CE_OVER_SRST boolean parameter to select priority of SRST vs CE
    2. The $memrd and $memwr cells need support for mixed widths. I propose allowing a port to have a width that is a power-of-two multiple of the memory's width (with the address being appropriately shifted), without adding any new parameters for this purpose. Since this is commonly described in Verilog as a loop over the low bits of the narrow memory's address (at least Xilinx infers wide ports from this pattern), I propose adding logic to memory_share pass that would recognize ports sharing high bits of the address and having constants in the low bits, merging them to a larger port. This should cover the inference side.

    3. I propose not adding a $memrdwr cell — having thought of it, it seems to be more trouble than it's worth, and the memory port sharing should rather be decided in the memory_bram replacement that has more information about shareability (for example, on Xilinx, read and write ports sharing an address cannot be merged into a read-write port if they have completely independent enables — only "read exclusive with write" and "write implies read" can actually be realized).

    4. I propose removing the current TRANSPARENT parameter and instead being able to mark read ports as transparent with an arbitrary subset of write ports. To this end, I propose adding a unique sequential identifier parameter ID for write ports, and adding a new TRANSPARENT parameter to read ports that is a bitmask of write port IDs that this port should be transparent with. I also propose adding a new pass that would recognize soft transparency logic on read ports and convert it to the appropriate TRANSPARENT bit.

    5. Analogously to transparency, I propose replacing the current PRIORITY parameter on write ports with a bitmask of which other write ports this write port has priority over, thus changing a total ordering into a partial ordering, better able to represent write ports from disjoint processes.

    6. I propose removing the $mem cell and converting all passes to use the unpacked cells directly. While it'd be possible to update the $mem cell with the above improvements as well, the unpacked cells are just as easy to deal with in passes, and maintaining two sets of cells is more trouble than it's worth.

  2. I propose the following changes to the flow, other than the obvious updates to support the above functionality:

    1. Remove opt_mem, upgrade opt_clean to do its job instead.
    2. Add a pass (memory_transparent? or maybe a part of memory_dff?) to recognize transparency implemented in soft logic and set the TRANSPARENT bit.
    3. Add support in memory_share for recognizing wide ports — eg. when there are read ports differing only in low address bits (that are constant), merge them into one wide read port.
    4. memory_bram: do a complete rewrite, to be designed later. The basic idea stays the same (slurp a text file describing various blockrams, pick the best type that matches, emit temp cells that will be converted to actual vendor cells via techmap), but a lot of new capabilities need to be added, and the text format will probably be all-new.
hackfin commented 4 years ago

I propose removing the $mem cell and converting all passes to use the unpacked cells directly. While it'd be possible to update the $mem cell with the above improvements as well, the unpacked cells are just as easy to deal with in passes, and maintaining two sets of cells is more trouble than it's worth.

It's still not entirely clear to me how $memrd and $memwr would infer to a simulateable model (post-synthesis), unless running an architecture specific (post-map) simulation with the mapped vendor primitives. As of now, I don't see how two distinct modules would share the actual memory buffer variable except by changing the interface. Could you elaborate on that?

Question is, if $mem could be upgraded so far to be the 'cover-all-scenarios' abstract primitive to work for all BRAM architectures and map into a set of architecture-agnostic blocks like TRELLIS_DPR16X4 LUT RAMs. This would also make creation of libraries for new architectures easier (it's better to touch *.v than *.cc, IMHO).

There might be a few users still depending on $mem as of now, including me.

mwkmwkmwk commented 4 years ago

I propose removing the $mem cell and converting all passes to use the unpacked cells directly. While it'd be possible to update the $mem cell with the above improvements as well, the unpacked cells are just as easy to deal with in passes, and maintaining two sets of cells is more trouble than it's worth.

It's still not entirely clear to me how $memrd and $memwr would infer to a simulateable model (post-synthesis), unless running an architecture specific (post-map) simulation with the mapped vendor primitives. As of now, I don't see how two distinct modules would share the actual memory buffer variable except by changing the interface. Could you elaborate on that?

This has been said already — by not having a simulateable model, and handling $mem* in write_verilog instead to convert them back to Verilog code, so that they never actually appear in the output netlist.

Question is, if $mem could be upgraded so far to be the 'cover-all-scenarios' abstract primitive to work for all BRAM architectures and map into a set of architecture-agnostic blocks like TRELLIS_DPR16X4 LUT RAMs. This would also make creation of libraries for new architectures easier (it's better to touch *.v than *.cc, IMHO).

It could, but why would we ever want to do it specifically using $mem if we can have just one memory representation ($mem* separate cells) instead of two?

There might be a few users still depending on $mem as of now, including me.

If you're depending on the raw cell in some way, you'd better describe how instead of just saying it.

hackfin commented 4 years ago

This has been said already — by not having a simulateable model, and handling $mem* in write_verilog instead to convert them back to Verilog code, so that they never actually appear in the output netlist.

Yes, and how would other languages be handled? This would implicitely output a different structure than the yosys kernel is aware of, unless run through a pass, no? So you'd offload the problem of conversion to the language output plugins.

If you're depending on the raw cell in some way, you'd better describe how instead of just saying it.

I think I've given quite a few examples on that subject (above links), incl. hints on how to handle TDP for the ECP5 case with minor changes to the *.txt. Not sure if I'm still up to date, but the GHDL plugin also uses '$mem'. So this would introduce a compatibility break at some point, how are you going to handle this? Interfacing towards synthesis is not so much of an issue, if you want to offload $mem[rd|wr] intermediate mapping to language output. But this doesn't look "nice", really.

mwkmwkmwk commented 4 years ago

This has been said already — by not having a simulateable model, and handling $mem* in write_verilog instead to convert them back to Verilog code, so that they never actually appear in the output netlist.

Yes, and how would other languages be handled? This would implicitely output a different structure than the yosys kernel is aware of, unless run through a pass, no? So you'd offload the problem of conversion to the language output plugins.

Correct, and so what? The language output handles this already anyway. The current Verilog backend does not emit the $mem cell raw into the output anyway, it's converted to an actual Verilog memory.

If you're depending on the raw cell in some way, you'd better describe how instead of just saying it.

I think I've given quite a few examples on that subject (above links), incl. hints on how to handle TDP for the ECP5 case with minor changes to the *.txt.

What, your one-off TDP scripts? Given that the point of this issue is to handle this properly within yosys core, I don't think keeping them working is worth it.

Not sure if I'm still up to date, but the GHDL plugin also uses '$mem'.

... this is pretty much a GHDL bug that needs to be fixed, given that the current memory optimization passes operate only on the unpacked cells anyway (memory_share specifically) and thus GHDL output doesn't trigger those.

So this would introduce a compatibility break at some point, how are you going to handle this?

I'm not — RTLIL, including the cell primitives, is not and has never been considered a stable interface. Perhaps it should be, but at the moment we really don't have the means to do that (and this is, btw, independent of the $mem removal discussion — adding new parameters to $memrd cell is already causing compatibility problems, let alone the TRANSPARENT changes). I'll probably make a GHDL plugin patch to fix the fallout, but that's about it.

Interfacing towards synthesis is not so much of an issue, if you want to offload $mem[rd|wr] intermediate mapping to language output. But this doesn't look "nice", really.

I'm not offloading anything — the $mem cell already has special handling in the Verilog frontend. What I want to do is to get rid of two needlessly redundant representations of the same thing. Perhaps I could be convinced to go the other way around and kill $memwr/$memrd/$meminit instead, but one of the two definitely needs to go.

hackfin commented 4 years ago

What, your one-off TDP scripts? Given that the point of this issue is to handle this properly within yosys core, I don't think keeping them working is worth it.

See also https://github.com/ghdl/ghdl/issues/1069. And I tend to keep them working (as a reference) until I see these test cases inferring like in other $-Tools.

I'm not — RTLIL, including the cell primitives, is not and has never been considered a stable interface.

The yosys LaTeX docs kinda insinuates so. I have no problem with that, if there's an entry point documenting changes. But that's another topic indeed. Compatibility breaks are kinda painful when introduced suddenly without a reference test that demonstrates the 'new' way.

What I want to do is to get rid of two needlessly redundant representations of the same thing. Perhaps I could be convinced to go the other way around and kill $memwr/$memrd/$meminit instead, but one of the two definitely needs to go.

I fully agree on reducing maintenance work. I just found it easier to map and verify a single pseudo primitive instead of different ones that need to get collected and sorted out accordingly. So you're saying that collecting this information from opt passes, techmapping and language backends (I'm aware of $mem handling) takes less maintenance than conversion into an abstracted intermediate (complex) $memx splitting into a *v. library? Happy to check out at a prototype branch for demonstration.

mwkmwkmwk commented 4 years ago

Here goes a proposal for the new text format description, along with some examples: https://gist.github.com/mwkmwkmwk/2e51a10ba55bda239ac1b667ca062693

Open issues:

mwkmwkmwk commented 3 years ago

So, since people have been asking about this, here's the current status:

First, the memory model changes and recognition passes for newly-supported features have all landed. This includes:

HOWEVER, our current BRAM mapping pass (memory_bram) is unable to make use of most of the above improvements — it will either emulate the newly-added functionality instead of realizing it in hardware, or completely reject memories using the newly added features (however, it will not reject anything it would have accepted before). This means very little of the above is actually usable now.

While some new functionality (init/reset values, specifically) could be reasonably retrofitted into memory_bram, in general its architecture is just not powerful enough for some features we really want (single-port memory, true dual-port memory, wide ports), so the plan is to instead spend the resources on writing a more flexible replacement (using the design detailed above), which will complete this saga and close a whole lot of long-standing issues in one big swoop. The timeframe for this is 1-2 months from now.

mithro commented 3 years ago

This is excellent work @mwkmwkmwk and I want to say a huge thank you for taking on this epic task!

mwkmwkmwk commented 2 years ago

A very, very untested solution is now available as #3189.