RTimothyEdwards / open_pdks

PDK installer for open-source EDA tools and toolchains. Distributed with setups for the SkyWater 130nm and Global Foundries 180nm open processes.
http://opencircuitdesign.com/open_pdks
Apache License 2.0
283 stars 85 forks source link

Extraction creates spurious pin on some sky130_fd_sc_hd__a21bo_1 cells #210

Open antonblanchard opened 2 years ago

antonblanchard commented 2 years ago

I have a design that is failing LVS because sky130_fd_sc_hd__a21bo_1 cells have gained a pin (w_69_21#):

.subckt sky130_fd_sc_hd__a21bo_1 A1 A2 B1_N VGND VPWR X VNB VPB w_69_21#

I'm a bit lost as to how to debug this. I know the issue is in the top and bottom left of the cell. I also know the labels are being created in Magic in extHardGenerateLabel(). Enabling the debugging in there shows every bad label:

Hard way: generated label = "_34823_/w_69_21#"

I'm not sure why we are creating them however. There's a VNB or VPB pin nearby.

antonblanchard commented 2 years ago

Testcase: extract-fail.tar.gz

d-m-bailey commented 2 years ago

@antonblanchard magic will 'create' ports if there's a short to an internal node. Can you post the extracted spice file?

Or at least the extracted netlist for sky130_fd_sc_hd__a21bo_1.

antonblanchard commented 2 years ago

@d-m-bailey here are the .ext and .spice files for both the sky130_fd_sc_hd__a21bo_1 cell and the macro as a whole: extract-fail-2.tar.gz

The cell itself:

* NGSPICE file created from sky130_fd_sc_hd__a21bo_1.ext - technology: sky130A

.subckt sky130_fd_sc_hd__a21bo_1 A1 A2 B1_N VGND VPWR X VNB VPB
X0 a_298_297# a_27_413# a_215_297# VPB sky130_fd_pr__pfet_01v8_hvt ad=0p pd=0u as=0p ps=0u w=1e+06u l=150000u
X1 a_215_297# a_27_413# VGND VNB sky130_fd_pr__nfet_01v8 ad=0p pd=0u as=0p ps=0u w=650000u l=150000u
X2 a_298_297# A2 VPWR VPB sky130_fd_pr__pfet_01v8_hvt ad=0p pd=0u as=0p ps=0u w=1e+06u l=150000u
X3 X a_215_297# VGND VNB sky130_fd_pr__nfet_01v8 ad=0p pd=0u as=0p ps=0u w=650000u l=150000u
X4 VPWR B1_N a_27_413# VPB sky130_fd_pr__pfet_01v8_hvt ad=0p pd=0u as=0p ps=0u w=420000u l=150000u
X5 X a_215_297# VPWR VPB sky130_fd_pr__pfet_01v8_hvt ad=0p pd=0u as=0p ps=0u w=1e+06u l=150000u
X6 a_382_47# A1 a_215_297# VNB sky130_fd_pr__nfet_01v8 ad=0p pd=0u as=0p ps=0u w=650000u l=150000u
X7 VGND B1_N a_27_413# VNB sky130_fd_pr__nfet_01v8 ad=0p pd=0u as=0p ps=0u w=420000u l=150000u
X8 VPWR A1 a_298_297# VPB sky130_fd_pr__pfet_01v8_hvt ad=0p pd=0u as=0p ps=0u w=1e+06u l=150000u
X9 VGND A2 a_382_47# VNB sky130_fd_pr__nfet_01v8 ad=0p pd=0u as=0p ps=0u w=650000u l=150000u
.ends
d-m-bailey commented 2 years ago

So magic extracts an extra unconnected pin. There have recently been some changes to magic in this regard. https://github.com/RTimothyEdwards/magic/issues/122

What version of magic are you using?

antonblanchard commented 2 years ago

@d-m-bailey I'm running a check out of magic from today, so those changes are applied.

RTimothyEdwards commented 2 years ago

@antonblanchard : There is something going on in the top level layout that is causing this. In the top level SPICE, there are some of the a21bo_1 cells that connect the "extra" pin to VGND and some that treat it as an isolated node. There must be something about the top level layout, such as position relative to tap cells, or what row its on, that makes the difference. I can't debug it without the top level .mag file, since I need to see it attempting to extract the cell both ways.

This is different from most of the recent discussion with @d-m-bailey around unconnected pins, because all (?) of those issues were in ext2spice; that is, the .ext file was correct but the SPICE netlist was not. However, in this case, the issue is showing up in the .ext file. From the top level layout, it should be easy to debug, because I just need to break at the point where it turned the node w_69_21# into a port (separate from VNB).

antonblanchard commented 2 years ago

@RTimothyEdwards The original tarball above has the top level GDS (compressed with xz because it was too large for github to attach), as well the LEF/DEF. Here's the mag: Microwatt_FP_DFFRFile.mag.gz

RTimothyEdwards commented 2 years ago

@antonblanchard : Sorry, I should have seen the first tarball posting.

So I looked at the .ext file and found that there are only 37 instances that connect the spurious w_69_21# node to ground (out of a total of 390 instances of cell a21bo_1). All other instances of the a21bo_1 cell have this node as a no-connect (in the sense that they did not see it as a separate node and so did not make a connection to it, which is the correct behavior). I brought up the .mag layout and selected these 37 instances. Curiously, the 37 instances are roughly aligned along the edges of a box at (227.44um, 179.95um) to (748.36um, 801.7um). There is nothing else similar or remarkable about these 37 instances.

It is altogether rather bizarre. This is the map of the positions of instances that have incorrectly called out node w_69_21# as a separate connection: instance_map

antonblanchard commented 2 years ago

@RTimothyEdwards very strange!

antonblanchard commented 2 years ago

I have no idea why it would make a difference, but are they near the areaid.lowTapDensity boundary?

RTimothyEdwards commented 2 years ago

It can't make a difference. The low tap density boundary is only defined on the full chip top level, anyway, because its dimensions have to be set relative to the padframe.

RTimothyEdwards commented 2 years ago

It may be instructive to also plot the positions of all the instances that were not in error, and compare. It could be that the position of the instances is just an artifact of the design and the placement algorithm.

RTimothyEdwards commented 2 years ago

Yes, in fact, that's exactly it. The optimally placed design ends up with a ring of these instances. The ones that are in error look as far as I can tell to be randomly selected from the total. all_instance_map

RTimothyEdwards commented 2 years ago

Which tells me precisely nothing about the nature of the error. I can tell this one is going to be very painful to track down.

antonblanchard commented 2 years ago

I'm stumbling around in extHierConnectFunc2() and it looks like we are making a connection to the pwell pin (the overlap area matches the pin exactly), but for some reason we don't find the label and instead create one.

RTimothyEdwards commented 2 years ago

@antonblanchard : Takes a bit of work to get to that point. How are you setting your breakpoints for the debug?

antonblanchard commented 2 years ago

I've been adding random printf's into the code. I just realised by the time we hit extHierConnectFunc2() we've already generated the bad label, but I still think the issue is to do with how we label the pin. The pwell pin has a label but we aren't using it.

antonblanchard commented 2 years ago

Sorry, I was right the first time. The label appears in this backtrace:

#0  extHardSetLabel (scx=scx@entry=0x7ffc26224990, reg=reg@entry=0x4fb3740, arg=arg@entry=0x7ffc26224a70) at ExtHard.c:367
#1  0x00007f843dc23f37 in extHardProc (scx=0x7ffc26224990, arg=0x7ffc26224a70) at ExtHard.c:230
#2  0x00007f843dc29520 in extSubtreeHardUseFunc (use=<optimized out>, trans=<optimized out>, x=<optimized out>, 
    y=<optimized out>, arg=<optimized out>) at ExtSubtree.c:1293
#3  0x00007f843dbe3e1e in DBArraySr (use=0x4d88a30, searchArea=<optimized out>, func=0x7f843dc29430 <extSubtreeHardUseFunc>, 
    cdarg=140720948267632) at DBcellsrch.c:1350
#4  0x00007f843dc29d0d in extSubtreeHardNode (tp=tp@entry=0x7f842bc80c70, pNum=pNum@entry=10, et=et@entry=0x5e95940, 
    ha=ha@entry=0x7ffc2629b810) at ExtSubtree.c:1187
#5  0x00007f843dc29f0d in extSubtreeTileToNode (tp=0x7f842bc80c70, pNum=10, et=0x5e95940, ha=0x7ffc2629b810, 
    doHard=<optimized out>) at ExtSubtree.c:1065
#6  0x00007f843dc25049 in extHierConnectFunc2 (cum=cum@entry=0x7f842bc7eaf8, ha=ha@entry=0x7ffc2629b810) at ExtHier.c:518
#7  0x00007f843dbfd934 in DBSrPaintArea (hintTile=hintTile@entry=0x0, plane=<optimized out>, rect=rect@entry=0x7ffc26226070, 
    mask=0x7f843c840020, func=func@entry=0x7f843dc24f10 <extHierConnectFunc2>, arg=arg@entry=140720948754448)
    at DBtiles.c:434
#8  0x00007f843dc24cf1 in extHierConnectFunc1 (oneTile=oneTile@entry=0x7f842bc80c70, ha=ha@entry=0x7ffc2629b810)
    at ExtHier.c:367
#9  0x00007f843dbfd934 in DBSrPaintArea (hintTile=hintTile@entry=0x0, plane=<optimized out>, rect=rect@entry=0x7ffc2629b900, 
    mask=0x7f843e61f1a0 <DBAllButSpaceBits>, func=func@entry=0x7f843dc24a80 <extHierConnectFunc1>, 
    arg=arg@entry=140720948754448) at DBtiles.c:434
#10 0x00007f843dc24049 in extHierConnections (ha=ha@entry=0x7ffc2629b810, cumFlat=cumFlat@entry=0x7ffc2629b828, 
    oneFlat=oneFlat@entry=0x5e95940) at ExtHier.c:281
#11 0x00007f843dc298b1 in extSubtreeFunc (scx=<optimized out>, ha=0x7ffc2629b810) at ExtSubtree.c:822
#12 0x00007f843dbe27e6 in dbCellSrFunc (use=0x4d88a30, cxp=<optimized out>) at DBcellsrch.c:1175
#13 0x00007f843dbe2b20 in DBSrCellPlaneArea (plane=<optimized out>, rect=rect@entry=0x7ffc2629b700, 
    func=func@entry=0x7f843dbe23d0 <dbCellSrFunc>, arg=arg@entry=140720948754032) at DBcellsrch.c:93
#14 0x00007f843dbe318d in DBCellSrArea (scx=scx@entry=0x7ffc2629b6f0, func=func@entry=0x7f843dc296d0 <extSubtreeFunc>, 
    cdarg=cdarg@entry=140720948754448) at DBcellsrch.c:1127
#15 0x00007f843dc2a2d0 in extSubtreeInteraction (ha=ha@entry=0x7ffc2629b810) at ExtSubtree.c:473
#16 0x00007f843dc2a7bd in extSubtree (parentUse=<optimized out>, reg=reg@entry=0x50b1530, f=f@entry=0x1dbc100)
    at ExtSubtree.c:256
#17 0x00007f843dc21f57 in extCellFile (def=def@entry=0x1dc65b0, f=f@entry=0x1dbc100, doLength=doLength@entry=1 '\001')
    at ExtCell.c:408
#18 0x00007f843dc22015 in ExtCell (def=def@entry=0x1dc65b0, outName=outName@entry=0x0, doLength=doLength@entry=1 '\001')
    at ExtCell.c:119
#19 0x00007f843dc2720b in extExtractStack (stack=0x4faf750, doExtract=doExtract@entry=1 '\001', rootDef=0x1dc65b0)
    at ExtMain.c:679
#20 0x00007f843dc27478 in ExtIncremental (rootUse=rootUse@entry=0x4de4220) at ExtMain.c:569
#21 0x00007f843dbcc96c in CmdExtract (w=<optimized out>, cmd=<optimized out>) at CmdE.c:1019
#22 0x00007f843dc5fb42 in WindExecute (w=0x18e5e10, rc=<optimized out>, cmd=0x4de33b0) at windMain.c:415
#23 0x00007f843dc056d4 in DBWcommands (w=<optimized out>, cmd=<optimized out>) at DBWprocs.c:632
#24 0x00007f843dc5da35 in WindSendCommand (w=<optimized out>, w@entry=0x0, cmd=cmd@entry=0x4de33b0, 
    quiet=quiet@entry=1 '\001') at windSend.c:305
#25 0x00007f843dc58330 in TxTclDispatch (clientData=clientData@entry=0x0, argc=argc@entry=1, argv=argv@entry=0x18299b0, 
    quiet=quiet@entry=1 '\001') at txCommands.c:1180
d-m-bailey commented 2 years ago

Just a shot in the dark here, but is there anything unique about the orientation of the offending instances?

RTimothyEdwards commented 2 years ago

@d-m-bailey : Lest you think there's anything obvious about it, I just ran extraction myself on the same layout. I got the same number of cells reporting a disconnected net---37---but they weren't all the same ones! Six cells had changed one way, and six cells had changed the other way.

antonblanchard commented 2 years ago

@d-m-bailey Good point wrt orientation, made me realise all the issues are related to the pwell pin. I do notice the magic tech file does some different things just for pwell layers, I wonder if that is confusing the label mapping code sometimes.

antonblanchard commented 2 years ago

In the GDS for the cell itself, I don't see a pwell.drawing layer (to go with the pwell.pin layer). Could that cause issues?

RTimothyEdwards commented 2 years ago

@antonblanchard : I've at least pinpointed it to the following: There is a two-pass call to extHardProc(). The first time is supposed to search for labels, and the second time is supposed to generate a label the hard way (which generates the w_69_21# string). The first pass calls ExtLabelRegions() (ExtHard.c line 226) which fails to find a label that it should be finding. I can only check this before it fails, so since it fails on about 1 in 10 of these cells, I guess I keep trying until I see it fail.

RTimothyEdwards commented 2 years ago

@antonblanchard : Per your last statement: The pwell layer is generated originally by magic's GDS input routine, and again by the extraction routine, so it has nothing to do with the original GDS. What I'm seeing, as I said above, is that the routine looks like it's working perfectly normally, and the label it's supposed to find is there. I just can't see what part of that code is failing to work right.

RTimothyEdwards commented 2 years ago

Ah. The pwell layer is split between the label and the nFET area next to it. The ExtFindRegions() routine only finds the part of it that is under the nFETs, and failing to cross the gap, does not attach the pwell under the label to the same region. So ExtLabelRegions() looks at the tile under the label but sees that it is not attached to a region, and ignores it. There is more to this (which explains why it does work most of the time), but at least I have the general gist of what's going on.

RTimothyEdwards commented 2 years ago

@antonblanchard , @d-m-bailey : After thinking over this for a while, I see that there is (1) a hack solution, and (2) a correct solution. The underlying problem is that the method I have for handling isolated substrates needs to be reworked because it also needs to handle the case of the regular substrate. Since it doesn't, cells that have unconnected areas of drawn pwell create issues because the usual connectivity-finding algorithm won't search across the gap between them to figure out that they are (implicitly) connected, and will then treat them as unconnected. As it happens, for the sky130_fd_sc_hd library, it appears that only the a21bo_1 cell has all the nFETs pushed far enough to the right that there is nothing directly above the VNB substrate label, so the VNB substrate label ends up in its own little area of pwell (there are probably a few other such cells if I look hard enough, but they're clearly rare). So solution (1) is to just modify the sky130_fd_sc_hd__a21bo_1 cell to add additional pwell to bridge the gap between the VNB label and the nFETs. That will make the top level circuit extract correctly. Solution (2) takes considerably more work and isn't going to be done in a day, although I have a basic idea how it would need to be implemented. Solution (1) is easy and I can just add a custom exception to open_pdks to deal with that one cell.

RTimothyEdwards commented 2 years ago

@antonblanchard : I just updated open_pdks to version 1.0.274. This has the custom exception I mentioned above, and the a21bo_1 cell gets modified with an added pwell shape to bridge the gap between the separate drawn pwells. That should correct this particular extraction problem while I work on a more thorough fix in magic.

RTimothyEdwards commented 2 years ago

Please leave this issue open until the underlying problem has been addressed in the magic extraction code.

antonblanchard commented 2 years ago

@RTimothyEdwards that fixes it! My macro made it all the way through LVS with that fix. Thank you!

RTimothyEdwards commented 2 years ago

@antonblanchard : Thank you for posting the example, which was very helpful, and for helping with debugging, which was also quite useful.

mkkassem commented 2 years ago

@RTimothyEdwards @antonblanchard @d-m-bailey I have seen this issue when there is a short across the hierarchy from a routing later into a net on the same layer inside the cell. It only happens in some of the cells not all instances of a given std cell. Can you zoom in and visually inspect one of the the instances with an extra port?

RTimothyEdwards commented 2 years ago

@mkkassem : That is not the issue under discussion.