RTimothyEdwards / magic

Magic VLSI Layout Tool
Other
468 stars 98 forks source link

FATAL ERROR MPW7: Can not read gds #185

Closed d-m-bailey closed 1 year ago

d-m-bailey commented 1 year ago

PG ID 1320: When reading the gds file, magic terminates with no error message.

user@ciic-cvc:~/mpw-7/projects/1320-sky130mpw5-sramtest/extra_be_checks/1320-test$ ./run_ext 

Magic 8.3 revision 322 - Compiled on Thu Sep  8 22:10:53 PDT 2022.
Starting magic under Tcl interpreter
Using the terminal as the console.
Using NULL graphics device.
Processing system .magicrc file
Sourcing design magicrc.sky130B for technology sky130B ...
2 Magic internal units = 1 Lambda
Input style sky130(vendor): scaleFactor=2, multiplier=2
The following types are not handled by extraction and will be treated as non-electrical types:
    ubm 
Scaled tech values by 2 / 1 to match internal grid scaling
Loading "gds.spice.tcl" from command line.
Warning: Calma reading is not undoable!  I hope that's OK.
Library written using GDS-II Release 6.0
Library name: LIB
Reading "SP6TCell".
Reading "DP8TCell".
...
Reading "user_analog_project_wrapper".

To duplicate, download attached tarball and expand.

cd 1320-test
./run_ext

1320-test.tar.gz

RTimothyEdwards commented 1 year ago

@d-m-bailey : That is a really bizarre error. It's like it hit some kind of internal nesting limit. I can view everything in the layout from bottom up (say, starting at the base cell DB8TCell) upward until I hit the cell ConnectedSRAM. Expanding that causes magic to segfault, with the backtrace showing DBExpandAll() calling dbExpandFunc() recursively down the (ridiculously deep) hierarchy of the design. At the very lowest level (DP8TCell), the call to DBSrCellPlaneArea() ends up with corrupted pointers. It looks like stack smashing, except that for all that the design has an insanely deep hierarchy, the depth is nowhere near the usual stack smashing limit; the subroutine call stack is only 89 levels deep. But the call stack got corrupted somehow.

RTimothyEdwards commented 1 year ago

@d-m-bailey : After a bit of analysis, I found that the routine DBSrCellPlaneArea() declares a variable of type BPEnum. This variable is a structure that contains an array of size 10000, so every time this routine is called, it eats up half a megabyte of the stack. Unfortunately, it is used in DBSrCellPlaneArea() which is typically called recursively---not just for expand (which causes the crash above), but for many functions that search the database for all cell uses in a given area.

The only real solution to this that I can think of would be to rewrite the code to avoid recursion. That could be a pretty sweeping change, given how many routines make recursive calls to DBSrCellPlaneArea().

RTimothyEdwards commented 1 year ago

Note that my Linux system it typical in setting a stack limit (ulimit -s) of 8192kB (i.e., 8MB). If any cell search takes 1/2MB per level of depth, then the layout can't be deeper than about 16 levels of hiearchy. This design exceeds that depth.

RTimothyEdwards commented 1 year ago

I was hoping that this was just something stupid that I did when porting the BPlane routines from MicroMagic. But the MicroMagic code does exactly the same thing, and will hit the same recursion limit.

RTimothyEdwards commented 1 year ago

@d-m-bailey : Okay, duh, should have occurred to me earlier that the obvious solution is to just malloc() the BPEnum structure and free it within the subroutine where it's used, so that it doesn't end up in stack memory. Implemented in 8.3.327 and no longer crashes on this design.

RTimothyEdwards commented 1 year ago

This should be fixed now, so I'm going to close this issue.