MikePopoloski / slang

SystemVerilog compiler and language services
MIT License
558 stars 122 forks source link

slang-netlist new feature: Detect combinatorial loops #984

Closed udif closed 1 month ago

udif commented 1 month ago

The following PR adds combinatorial loop detection to slang-netlist. The original algorithm can be found here:

https://epubs.siam.org/doi/10.1137/0204007 Finding All the Elementary Circuits of a Directed Graph Johnson, Donald B SIAM Journal on Computing Vol. 4, Issue. 1 Mar 1975

I have taken a Java implementation from https://github.com/josch/cycles_johnson_meyer , forked it to modernize the Java, then ported the code to C++ and finally modified my generic C++ code to work with @jameshanlon 's code.

While the basic code works, there are lots of things that can be improved:

Here is a sample input and output:

% cat test.v
module test (input clk);
wire a;
wire b;
wire c;
wire d;

t2 t2(
  .clk(clk),
  .x(a),
  .y(b),
  .z(c)
);
assign a = b & c;
endmodule

module t2(input clk, input x, output y, output reg z);
assign y = x;
always @(posedge clk)
  z <= x;
endmodule
% build/bin/slang-netlist --comb_loops test.v
Top level design units:
    test

Build succeeded: 0 errors, 0 warnings
Nodes: 29
Detected 1 combinatorial loop:
Variable declaration: test.a => a (Assigned to) => Variable alias test.t2.x => x => y (Assigned to) => Variable alias test.t2.y => b => Variable declaration: test.b => b => a (Assigned to)
codecov[bot] commented 1 month ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 94.21%. Comparing base (ddc950d) to head (3bed0dc). Report is 77 commits behind head on master.

Additional details and impacted files [![Impacted file tree graph](https://app.codecov.io/gh/MikePopoloski/slang/pull/984/graphs/tree.svg?width=650&height=150&src=pr&token=sS5JjK9091&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Michael+Popoloski)](https://app.codecov.io/gh/MikePopoloski/slang/pull/984?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Michael+Popoloski) ```diff @@ Coverage Diff @@ ## master #984 +/- ## ========================================== + Coverage 94.01% 94.21% +0.20% ========================================== Files 189 191 +2 Lines 48479 46903 -1576 ========================================== - Hits 45576 44189 -1387 + Misses 2903 2714 -189 ``` [see 147 files with indirect coverage changes](https://app.codecov.io/gh/MikePopoloski/slang/pull/984/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Michael+Popoloski) ------ [Continue to review full report in Codecov by Sentry](https://app.codecov.io/gh/MikePopoloski/slang/pull/984?dropdown=coverage&src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Michael+Popoloski). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Michael+Popoloski) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://app.codecov.io/gh/MikePopoloski/slang/pull/984?dropdown=coverage&src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Michael+Popoloski). Last update [4f37bea...3bed0dc](https://app.codecov.io/gh/MikePopoloski/slang/pull/984?dropdown=coverage&src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Michael+Popoloski). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Michael+Popoloski).
udif commented 1 month ago

At the moment I'm also debugging another issue. I use ultraembedded's JPEG core for tests. I can run it against the whole core, or subsets of it. One subset with <700 nodes passes almost immediately. When I moved to a different subtree with ~2000 nodes, the program dies after a few minutes with a segmentation fault. I don't know if this is a complexity issue, or a deadlock triggered by the specific netlist, because it jumps from a sub-second run, to dying after a few minutes.

jameshanlon commented 1 month ago

I noticed while looking at #985 that it takes a long time to process the netlist, so this may be related.

$ time ./build/bin/slang-netlist ../core_jpeg/src_v/jpeg_dht*
Top level design units:
    jpeg_dht

Build succeeded: 0 errors, 0 warnings

real    0m6.035s
user    0m6.025s
sys     0m0.020s

If you run with --debug you can see it spends a lot of time looking up variables that appear in the LUTs, eg:

NetlistVisitor.h:70: Edge decl lookup_input_i to ref lookup_input_i
NetlistVisitor.h:83: Edge ref lookup_input_i to ref y_ac_width_r
NetlistVisitor.h:70: Edge decl lookup_input_i to ref lookup_input_i
NetlistVisitor.h:83: Edge ref lookup_input_i to ref y_ac_width_r
...
netlist.cpp:227: Netlist has 1131 nodes and 54233 edges

It needs some more investigation as to why there are so many edges being created.

udif commented 1 month ago

I just added a test (a single one, at least for the moment) to cover CombLoops. From my side I think the code is ready.

likeamahoney commented 1 month ago

As proof of concept is good but i suppose that analysis is need to be more conservative. It seems to me that checking the graph on presence of cycles (without outermost verilog context) is not quite enough for a good combinational(combinatorial) logic loops analysis. Because not every assignment belongs to combinational logic. There is also sequential and latch kinds of procedural logic.

For example in this simple code sample there is no combinational logic at all. It follows that there are also no combinational cycles (but PR solution reports that there is at least one combinational loop):

module top (input clk);
        reg a;
        reg b;

        test t1(.x(a), .y(b), .clk(clk));
        test t2(.x(b), .y(a), .clk(clk));

endmodule

module test(input reg x, output reg y, input clk);
        always_latch begin
                if (clk)
                        y <= x;
        end
endmodule
Build succeeded: 0 errors, 0 warnings
Nodes: 30
Actual active Nodes: 30
Detected 1 combinatorial loop:
Path length: 8
1.sv:5:13: note: variable a assigned to
        test t1(.x(a), .y(b), .clk(clk));
                   ^

1.sv:13:9: note: variable x read from
                        y <= x;
                             ^

1.sv:13:4: note: variable y assigned to
                        y <= x;
                        ^

1.sv:5:20: note: variable b read from
        test t1(.x(a), .y(b), .clk(clk));
                          ^

1.sv:6:13: note: variable b assigned to
        test t2(.x(b), .y(a), .clk(clk));
                   ^

1.sv:13:9: note: variable x read from
                        y <= x;
                             ^

1.sv:13:4: note: variable y assigned to
                        y <= x;
                        ^

1.sv:6:20: note: variable a read from
        test t2(.x(b), .y(a), .clk(clk));

I think it needs to be determined first which assignments are related to combinational logic but which aren't to make analysis more context sensitivity.

udif commented 1 month ago
  1. From my experience, 99% of the designs do not use latches, except for clock gating cells, therefore if you encounter a latch it is more often than not a design mistake rather than intention (although with always_comb and always @* we see less and less the plain old always @(a or b or c..) style that leads to unintentional latches).
  2. Even when you do use latches, I don't think you can easily analyze the latch control (by a simple algorithm) to determine whether it is a combinatorial loop or not.

From a practical point of view, since I cannot safely determine when a latch is safe, I would rather have a false positive I can waive (we'll have to add a mechanism for that) rather than skip a warning.

MikePopoloski commented 1 month ago

Let me know when you think this is ready to land.

likeamahoney commented 1 month ago
  1. From my experience, 99% of the designs do not use latches, except for clock gating cells, therefore if you encounter a latch it is more often than not a design mistake rather than intention (although with always_comb and always @* we see less and less the plain old always @(a or b or c..) style that leads to unintentional latches).
  2. Even when you do use latches, I don't think you can easily analyze the latch control (by a simple algorithm) to determine whether it is a combinatorial loop or not.

From a practical point of view, since I cannot safely determine when a latch is safe, I would rather have a false positive I can waive (we'll have to add a mechanism for that) rather than skip a warning.

Modern designs are't use latches it's true but what about D-triggers/flip-flops (which widely used in modern designs to separate combinational logic) and other sequential logics? Provided solution also warns on it.

For example there is simple flip-flop without comb loops:

module top (input rst, input clk);
        wire D;
        wire Q;
        wire Qn;
        dff d1(D, clk, rst, Q, Qn);
        dff d2(Q, clk, rst, D, Qn);
endmodule

module dff (
  input  logic D, clk, rst,
  output logic Q, Qn
);
  always_ff @(posedge clk, posedge rst) begin
    if (rst) begin
      Q  <= 0;
      Qn <= 1;
    end else begin
      Q  <= D;
      Qn <= ~D;
    end
  end
endmodule
udif commented 1 month ago

Modern designs are't use latches it's true but what about D-triggers/flip-flops (which widely used in modern designs to separate combinational logic) and other sequential logics? Provided solution also warns on it.

This is simple - your example simply triggered a bug. This was not supposed to happen... The idea was to disconnect any node below always or always_ff statements with posedge or negedge in their sensitivity list. Apparently, as your example shows, there are bugs. I will check this.

However, I have found more serious false positive issues:

module t;
wire x, y, z;
assign {z, y} = {~y, ~x};
endmodule

This will also trigger a comb loop warning. While this specific one might be solved (at the cost of complicating the netlist logic) It looks like the general problem will not be solved without a full synthesis engine. For example:

module t;
wire x, y;
assign z = ~y & y;
assign y = z;
endmodule

It remains to be seen if the scope of this feature would be useful given the current limitations (known possible false positives).

jameshanlon commented 1 month ago

I would expect the netlist graph for assign {z, y} = {~y, ~x}; to be acyclic, so there may be a bug if that is not true.

I see your point though and agree that without full synthesis it's not possible to determine a true combinatorial loop.

The class of loops that slang-netlist can report are comparable to those reported by Verilator as UNOPTFLAT`, which are still useful to know about.

udif commented 1 month ago

I would expect the netlist graph for assign {z, y} = {~y, ~x}; to be acyclic, so there may be a bug if that is not true.

A combinatorial loop is reported because each of the x and y nodes have edges to z and y, as {z, y} is lumped into one assigment depending on y and x and is then split. slang-netlist is not smart enough to detect the bitwise assignment and separate the z = ~y and `y = ~x' assignments.

Maybe we should rename the option name to --possible-comb-loop so that people set the correct expectations, and accept a small number of false positives.

jameshanlon commented 1 month ago

slang-netlist is not smart enough to detect the bitwise assignment and separate the z = ~y and `y = ~x' assignments.

Apologies - I thought I had implemented that. My aim for slang-netlist is to provide a datastructure that resolves all source-level connectivity to the bit level. Handling packed vectors is important for that, so I'll try and look into it.

With regards to combinatorial loops, how about just calling the option --report-cycles and some description that these can potentially be combinatorial loops?

udif commented 1 month ago

With regards to combinatorial loops, how about just calling the option --report-cycles and some description that these can potentially be combinatorial loops?

But I'm not merely reporting cycles - I'm actively ignoring netlist edges that terminate on a posedge/negedge nodes.

BTW, another feature I would like to implement at a later stage is an improvement to the path reporting mode, that will also report the number of clock edges between the source and destination paths. It would require defining the clock signals, and keeping a table of clock aliases, either those directly connected, or those passing through a clock gating module (you would define the clock gating module name, and the input/output ports). Today I'm running into situations where I have a memory access bus running through several sample stages across my SoC, and having something that counts the total latency would be useful for me.

jameshanlon commented 1 month ago

But I'm not merely reporting cycles - I'm actively ignoring netlist edges that terminate on a posedge/negedge nodes.

Marking sequential edges is a generally useful feature of the tool, since it allows the netlist to be constrained to find combinatorial paths. This is how I implemented a previous similar tool.

BTW, another feature I would like to implement at a later stage is an improvement to the path reporting mode, that will also report the number of clock edges between the source and destination paths.

Nice, that would be useful.

(Btw. I've got no objections to this landing! We can always iron details out later.)

udif commented 1 month ago

This is simple - your example simply triggered a bug.

I've fixed the bug, but I still got 2 issues:

  1. For some tests, the code seems to be stuck in a loop. I need to dig deeper into the algorithm to understand what is the issue. I may dump the test case into a DOT file and run against the original Java code to see if it's a conversion issue or a bug in the original code.
  2. I also have a memory corruption bug that causes my unit tests to fail when I add more than one.

It will take me a few more days to resolve these.

udif commented 1 month ago

It seems the bugfix above (2c57b7d) solved all stability and corruption issues, but when running against the full core_jpeg example I've used before, I ran into #993 .

udif commented 1 month ago

At the moment, My code works great under clang-sanitize, where I've debugged it, but for some reasons fails under gcc-11, where there seems to be a memory corruption causing a segment violation as well as locking up the watch windows on vscode. I'm looking into this.

udif commented 1 month ago

While I've fixed the crashes, it seems there are still other issues (lockups on some examples).

udif commented 1 month ago

While I've fixed the crashes, it seems there are still other issues (lockups on some examples).

False alarm, I was just being impatient and stopped the test after a few seconds. It ended up taking a few more seconds than what I estimated.

Let me know when you think this is ready to land.

@MikePopoloski @jameshanlon I think it is ready from my side.