Closed atorkmabrains closed 1 year ago
Also, I believe you don't do pruning for edges for example for width_check here: https://github.com/KLayout/klayout/blob/7a9e9989d3fb8259d9ee3b5d6c8fe803092e7f2b/src/db/db/dbAsIfFlatEdges.cc#L695
Would it be possible to prune the edges to the ones that are in the same polygon only: https://github.com/KLayout/klayout/blob/7a9e9989d3fb8259d9ee3b5d6c8fe803092e7f2b/src/db/db/dbAsIfFlatEdges.cc#L705
@atorkmabrains Thanks for the analysis. My experience with profiling is a mixed blessing. IMHO it's much better to break down a test case. Once you have the bottleneck operation, you can try to understand what it is doing.
I don't see how long the different steps take. But from my experiences with the other cases, I think the problem is in the MDN.3b and MDN.4b rules. These contain an edge width operation and I think this is where your observed runtime penalty comes from. Here is a clip from my log:
...
2022-11-23 22:24:19 +0100: Memory Usage (5215992K) : Executing rule MDN.2b
2022-11-23 22:24:19 +0100: Memory Usage (5215992K) : Executing rule MDN.3a
2022-11-23 22:24:20 +0100: Memory Usage (5215992K) : Executing rule MDN.3b
2022-11-23 22:26:12 +0100: Memory Usage (6318536K) : Executing rule MDN.4a
2022-11-23 22:26:12 +0100: Memory Usage (6318536K) : Executing rule MDN.4b
2022-11-23 22:26:12 +0100: Memory Usage (6318536K) : Executing rule MDN.5ai
The core implementation of MDN.3b is this:
mdn3b_l1 = poly2.edges.and(ncomp).or(mvsd_mdn).and(ldmos_xtor).and(dualgate).not(ngate.not(mvsd).edges.interacting(poly2.edges.and(ncomp).or(mvsd_mdn)).width(20.001.um).edges)
I reduced the DRC deck to this piece and tried to unfold the long expressions:
verbose
source("user_proj_example.gds")
target("test_out.gds")
comp = polygons(22 , 0 )
nplus = polygons(32 , 0 )
pplus = polygons(31 , 0 )
poly2 = polygons(30 , 0 )
mvsd = polygons(210, 0 )
ldmos_xtor = polygons(226, 0 )
dualgate = polygons(55 , 0 )
ncomp = comp & nplus
tgate = poly2 & comp
ngate = nplus & tgate
mvsd_mdn = mvsd.
edges.
and(ncomp).
and(poly2)
mdn3b_l1 = poly2.
edges.
and(ncomp).
or(mvsd_mdn).
and(ldmos_xtor).
and(dualgate).
not(ngate.
not(mvsd).
edges.
interacting(poly2.
edges.
and(ncomp).
or(mvsd_mdn)
).
width(20.001.um).
edges
)
Running that with klayout -b -r reduced.drc
shows me the individual operations. The long runner is "width(20.001.um)" with 116s (of a total of 123s) as your profiling indicates too.
This is an edge collection width operation. The remark about pruning is correct. But the edges don't know about the polygons they come from. Hence no pruning with respect to original can be done. I tried to explain this problem earlier. An edge set is a collection of disconnected edges. "width" is not referring to "inner distance" but only to a certain relative edge orientation indicative.
This gets visible when the inputs (red) and outputs (green) of "width" are plotted:
The green mess are the actual edge pairs produced by "width"! Not kidding. This is just a tiny piece. This whole output is a plethora of >8M polygons. This does not make sense at all.
A "width" operation is much more efficient when used on polygons. In that case, only individual polygons are considered and pruning in your sense is implied. But I think this is not applicable here as the computation for the inputs are already done in edge domain.
An elaborate solution was a "edge width inside polygon" feature. That is not available (yet), but I think it can be emulated:
The base polygon for this check is ngate.not(mvsd)
. This means all relevant edges are on the boundary of these polygons. So instead of width-checking the edges, we could first width-check the base region (which is usually a rectangle, so this check is trivial) and then mask the sensitive edges with those candidates.
Like this:
ngate_wo_mvsd = ngate.not(mvsd)
wide_ngate = ngate_wo_mvsd.drc(width(projection) <= 20.um).edges
mdn3b_l1 = poly2.
edges.
and(ncomp).
or(mvsd_mdn).
and(ldmos_xtor).
and(dualgate).
not(ngate_wo_mvsd.
edges.
interacting(poly2.
edges.
and(ncomp).
or(mvsd_mdn)
).
and(wide_ngate)
)
I don't have a test case that triggers this check, so correctness still needs to be confirmed. But in my case that executes in a few seconds.
I guess there are even smarter ways to solve this. For example, the "drc" function basically creates a local context which so that basically also provides the pruning you ask for. However, without a test case - specifically one with mvsd devices - this will be difficult to optimize.
And finally a last word regarding tiled mode: I think you need to supply a border that takes these long range interactions into account. It should basically be done automatically, but I'm not sure if long ranges are captured correctly. As the maximum distance encountered in the deck is 50µm, trying with a border of 50µm should be a first guess. However, that is quite a large and a noticeable overhead will be present.
In any case, flat mode is prohibitive for large logic blocks or memory arrays. And tiled or deep mode are the only modes supporting multiple cores.
Best regards,
Matthias
Here is another remark because I just noticed that:
In the code above this expression is given (unfolded version of the original code):
mdn3b_l1 = poly2.
edges.
and(ncomp).
or(mvsd_mdn).
and(ldmos_xtor).
and(dualgate).
not(ngate_wo_mvsd.
edges.
interacting(poly2.
edges.
and(ncomp).
or(mvsd_mdn)
).
and(wide_ngate)
)
I contains the following subexpression two times:
poly2.
edges.
and(ncomp).
or(mvsd_mdn)
This expression is actually computed twice, so it can be optimized:
ngate_wsides = poly2.edges.and(ncomp).or(mvsd_mdn)
mdn3b_l1 = ngate_wsides.
and(ldmos_xtor).
and(dualgate).
not(ngate_wo_mvsd.
edges.
interacting(ngate_wsides).
and(wide_ngate)
)
I think there are many such optimization opportunities. But again, difficult without a test case (or better: a test suite).
Matthias
@klayoutmatthias Thanks for the code review. We have a test suite for this rule deck. I'll send you the link.
I'll take a look at the coding of this rule and see if we can change the style of coding avoiding the use of edge with. BTW, in the commercial tool it keeps track of the edge from which polygon it comes from. I thought you might be doing something similar.
As for tiling, we have used double the maximum measurement in the rule deck as border, and yet it generated weird results. Let me test it again.
Deep didn't improve the runtime either.
Tiling is not necessarily a runtime improvement, but it keeps memory low as it only works on one tile and with multiple cores, tiles are run in parallel. However there is some overhead (because of border work example) and hence there is some trade-off. But I have hardly encountered a case when tiling creates artefacts and core scaling usually is very good.
If you work on larger designs - specifically with memory blocks - you cannot use flat mode. Memory footprint will simply explode.
Matthias
@klayoutmatthias Thanks for your input. I believe. I need to redo many parts of the script as I believe we use edges then width/separation check very often. It was a bad assumption on my part that you prune based on original polygon.
@atorkmabrains I don't think you need to undo all - in fact only a few.
If the width you check is a small one, the edge width check is very efficient. Even if it does not consider the polygon it comes from it will do an efficient local clustering (scan line algorithm in flat case). So if all relevant edges are within the search range and the local clusters do not involve many foreign or irrelevant edges, there is no performance or memory penalty.
The issue with the large-range width check is that the local cluster will be very large and in case of dense layout you collect a huge neighbourhood for each input edge. This creates manifold interactions as we've seen in the green error markers before. The number of interactions roughly grows with the square of the check distance which explains the extreme runtime penalty.
My gut feeling is that for this technology node every distance below 1 or 2µm can safely be checked with edge operations (I assume you will not test against pathologic cases :) ).
Best regards,
Matthias
@klayoutmatthias that's really helpful information. I would consider this in our implementation.
Thanks @klayoutmatthias I'll close this issue as it's no longer needed.
Hi @klayoutmatthias ,
I'm still working on investigating the performance issues in DRC that we are facing. I ran another profiling to see where is the main issue:
The layout was not that big. I don't recommend running tiled version as it gives false errors.
I don't recommend using Google version as it's not updated. Please use Mabrains version as it's updated.
To duplicate this run, you need the following:
Here is the run command:
From my initial analysis, I do believe the main bottleneck most likely lies in the "do_process" function here: https://github.com/KLayout/klayout/blob/327345f5ca86e3d5f4490d199e586c7bf5d4599d/src/db/db/dbBoxScanner.h#L304
I'm not sure if that's true or not. But it seems to be iterative process.