YosysHQ / yosys

Yosys Open SYnthesis Suite
https://yosyshq.net/yosys/
ISC License
3.3k stars 860 forks source link

Redo internal cell memory layout #4461

Open widlarizer opened 1 week ago

widlarizer commented 1 week ago

Brief

What are the reasons/motivation for this change? small is fast! Explain how this is achieved. make cell small. If applicable, please suggest to reviewers how they can test the change. right now, nothing works

Full

Core idea

Yosys is conceptually flexible in what a cell can be. This makes it easy to do odd custom things that match use cases. The way this is achieved is inefficient for commonly represented types. Each cell has a dict from interned parameter name strings to parameter value constants. It also has a dict from port name strings to signals represented by SigSpecs. However, most cells that Yosys uses in common flows have a fixed set of ports and parameters. So instead of dynamically growing space for a dict and hashing to access the value stored in it, we can have preallocated space and a compile-time known mapping if we just use a struct specific to a given cell type:

// $not etc
struct RTLIL::Unary {
    SigSpec a;
    SigSpec y;
    Const a_width;
    Const y_width;
    Const is_signed;
};

We can then use a tagged union to construct a new, slimmer, faster to access cell type, with a fallback pointer to the old cell structure.

struct RTLIL::Cell
{
    RTLIL::IdString type;
    RTLIL::IdString name;
    RTLIL::Module* module;
    union {
        RTLIL::Unary not_;
        RTLIL::Unary pos;
        RTLIL::Unary neg;
        RTLIL::OldCell* flexible;
    };
};

Wait, wasn't this supposed to be a Yosys rewrite in Rust?

This isn't next generation Yosys (project nyan). Rust or Zig would be better to do what I'm doing, but Yosys is a C++ codebase, so C++ it is. This is a surgical, gradual transition to more efficient internals with minimal impact on stuff outside kernel/rtlil.h and kernel/rtlil.cc. The Yosys codebase (~200k LOC) and user plugins have a large number of code like cell->getParam(ID::A_WIDTH), we have to reproduce these interfaces rather than change them all to cell->not_.a_width. There's a good number of such functions. The annoying part is mocking for example the parameters dictionary. Instead, there's struct FakeParams which provides iterators to hide this. Same goes for connections. Implementations of getParam etc are simple switches over the cell type so they should inline well.

However, there's a lot of "internal" cells we may want to be fast, and their struct layouts and methods are basically duplicating the same information. Manually writing these methods is also very error prone, accessing tagged unions incorrectly is common UB. This is why, to add support for the various cell types, the next stage will get a bit weird:

Writing a compiler

Modern C++ offers multiple ways to have code that thinks about types. However, what we need here is to construct functions and construct their return types based on struct layouts. This level of reflection if possible seems too difficult to achieve with templates and constexpr and such. We don't have Rust's proc_macro, we don't have Zig's comptime, so one possible solution is a light Python utility which from information-dense Python or plain text structures emits generates C++ which is committed into the repository and checked with CI. Yosys already depends on Python in a similar way, see passes/pmgen/pmgen.py.

Beyond the cell

It's possible that the changes outlined above alone won't make a huge difference. RTLIL::Const is fairly inefficient as well. Size is virtuous in a positive feedback loop: the fewer bytes per cell, the more it helps to save a single byte per cell.

Size goals

The current flexible RTLIL::Cell is 192 bytes. The proposed one is a bit bigger while it supports only unary elements, but doesn't require external allocations. With the current flexible cell, this simple verilog eats 6kB memory per inverter-and-FF-bit:

module invert_n_times #(parameter N = 10000) (
    input wire in,
    output wire out
);
    wire [N:0] intermediate;

    assign intermediate[0] = in;

    genvar i;
    generate
        for (i = 0; i < N; i = i + 1) begin : invert_loop
            assign intermediate[i+1] = ~intermediate[i];
        end
    endgenerate

    assign out = intermediate[N];
endmodule

This suggests that focused effort can bring it down by a significant factor.

widlarizer commented 1 week ago

This was meant to be a draft PR and I hit the wrong button writing the description. Heck

Ravenslofty commented 1 week ago

There's an option to make the PR a draft; you don't need to recreate the PR or such.

whitequark commented 1 week ago

This looks like quite a few fairly complex changes that don't really address the fundamental issues with the Yosys IR and don't come with benchmarks.

povik commented 1 week ago

fundamental issues with the Yosys IR

What would be the fundamental issues provided we switch to the buffer-normalized mode of #3967 (which this PR should be compatible with)?

don't come with benchmarks

I pushed for @widlarizer to open a draft PR no matter what state this is in. Benchmarks can come later.

povik commented 1 week ago

so one possible solution is a light Python utility which from information-dense Python or plain text structures emits generates C++ which is committed into the repository and checked with CI. Yosys already depends on Python in a similar way, see passes/pmgen/pmgen.py.

FWIW pmgen-generated C++ code is not checked into the repository. It's treated as a build artifact.

whitequark commented 1 week ago

What would be the fundamental issues provided we switch to the buffer-normalized mode of #3967 (which this PR should be compatible with)?

That's an improvement, although other issues remain (like the unnecessary coarse/fine cell split, or the combined treatment of logic and inout wires in the same netlist).

widlarizer commented 1 week ago

Benchmarks are coming. The work presented here right now is exclusively an incomplete proof of concept. Incomplete as in it immediately segfaults on anything in the current state. I want to find a case where some cells being fast tracked really shows improvements. If we find better fundamental approach, all this is free to get canned, in such a case it still will have been a helpful exercise in my eyes

widlarizer commented 1 week ago

the unnecessary coarse/fine cell split

Can you elaborate on the adverse effects of this? From a performance point of view it doesn't seem too concerning, but I may be missing something.

combined treatment of logic and inout wires in the same netlist

I wanted to redesign the netlist first, but that was before I learned about bufnorm, after which it was suggested to me by povik that redoing how connections are handled would require excessive code changes and that bufnorm + smaller internal cells takes priority as a result.

whitequark commented 1 week ago

From a performance point of view it doesn't seem too concerning, but I may be missing something.

I don't think this affects performance but in general I don't care a whole lot about Yosys performance; as you may know I'm quite happy using the YoWASP version despite a ~2x perf hit, so clearly that much performance loss isn't an issue to me. But the IR is poorly defined and difficult to work with (I think the single most obnoxious issue from my perspective is the batshit behavior of \init, which is directly downstream from the coarse/fine split and some other bad decisions) and I'd much rather see that addressed than any performance improvements.

povik commented 1 week ago

I think the single most obnoxious issue from my perspective is the batshit behavior of \init, which is directly downstream from the coarse/fine split and some other bad decisions

I don't follow. How is \init related to the coarse/fine split, if that's what you are saying?

whitequark commented 1 week ago

I don't follow. How is \init related to the coarse/fine split, if that's what you are saying?

\init is only an attribute and not a parameter because of fine cells (which would otherwise have to triple in quantity), which in turn only lack parameters because of BLIF export. Nobody cares how many fine cells Yosys has, there's hundreds already, just do the 0, 1, X variant for them and stop imposing the cost of \init handling on every other pass.

widlarizer commented 1 week ago

YoWASP

I would be remiss if I didn't bring up that the road to performance I'm talking about is through major runtime memory size reduction which should matter for use cases of EDA in browsers.

IR is poorly defined and difficult to work with

As I understand, there are many open questions on design decisions in Yosys, as viewed from the perspective of correctness and developer experience and maintainability, but I'm not aware of any interaction this PR would have on those topics, other than this being an opportunity to have correct-by-construction, rather than correct-by-cell-checker cells.

povik commented 1 week ago

Nobody cares how many fine cells Yosys has, there's hundreds already, just do the 0, 1, X variant for them and stop imposing the cost of \init handling on every other pass.

Huh, so e.g. the state writeback in sim would involve changing the cell types on fine flip-flops, and adjusting a parameter on coarse flip-flops. In exchange we make init disappear. I think I like the change.

I'd much rather see that addressed than any performance improvements.

We should mention a bit about the background for the PR. It's motivated by company interests which allow us to invest significant effort into performance improvements. If we can fix some outstanding ills of the IR along the way, that's all the better, but it's not the case that we can pick any IR work instead. I guess we shouldn't be doubling down on a wrong design decision by writing specialized memory layouts around it, but that doesn't seem to be happening, at least not with the issues brought up so far.

whitequark commented 1 week ago

I would be remiss if I didn't bring up that the road to performance I'm talking about is through major runtime memory size reduction which should matter for use cases of EDA in browsers.

Interestingly, YoWASP already greatly benefits from such size reduction compared to native builds because it uses an ABI with 32-bit pointers. This is for example why nextpnr is ~on par with a native build, despite other inefficiencies.

I do welcome performance improvements (who doesn't?) but I feel that there are many other things to address in Yosys that I'm a lot more excited about, since the IR structure is a major pain point and performance rarely is, at least from where I stand.

other than this being an opportunity to have correct-by-construction, rather than correct-by-cell-checker cells.

I'm uncertain on whether this design would further entrench the existing IR flaws or not.

I think ultimately I'll choose to withhold any judgement I would have before I see benchmarks. That seems only right if this was intended to be a draft (which I didn't realize at first); I don't want to be negative on something that is clearly not the final work.

Huh, so e.g. the state writeback in sim would involve changing the cell types on fine flip-flops, and adjusting a parameter on coarse flip-flops.

Yes. The problem with \init is that it's antithetical to correctness-by-construction because it's just ignored on wires with no flops, and conflicts resolve kind of randomly? We've had a lot of difficulty with emitting it in Amaranth correctly as well, but that's not the main motivation.

whitequark commented 1 week ago

It's motivated by company interests which allow us to invest significant effort into performance improvements.

So on second thought and having read this, I think I'll just leave this PR to be driven by the company interests. Looks like it pinged me because of the CXXRTL code owner bit, I'll pre-emptively sign off on any changes there since they're quite minimal.

widlarizer commented 1 week ago

This PR was initiated by my personal interest which I found to be well aligned with company interests of making specifically opt_expr faster

widlarizer commented 1 week ago

Major road block identified: Yosys until now has been built with the assumption that at will one can modify cell types and erase and add params and ports. At coverage tag "opt.opt_expr.eqneq.isnot" that's exactly what's done. With my changes, it is possible to attempt it but it's union access UB. As a solution, the type field could become private, which will create large code change where read-only type access is replaced with a const getter (from cell->type to cell->type()). This will bring large code changes. Or we can just fix the places where types are modified (there seem to be 79 locations where type is set outside of addCell) and temporarily keep the UB hazard for new code