RTimothyEdwards / magic

Magic VLSI Layout Tool
Other
468 stars 98 forks source link

buffer-overflow in magic when creating GDS containing GF180MCU SRAM macro .mag #253

Closed chaufe closed 1 year ago

chaufe commented 1 year ago

Please find attached a test case ipython notebook that can be run in Google Colab. magic___buffer_overflow_when_merging_in_GF180MCU_SRAM_via_mag_file.ipynb.gz

In this test case, a simplistic design is implemented containing a pad ring and an SRAM macro, all based on GF180MCU.

To merge the GDS of the testcase, I have attached two version, one creating a buffer overflow inside magic, and one which doesn't.

To create the buffer overflow, no GDS file is read-in but addpath is being used: lef read $::env(CONDA_PREFIX)/share/pdk/gf180mcuC/libs.ref/gf180mcu_fd_sc_mcu7t5v0/techlef/gf180mcu_fd_sc_mcu7t5v0__nom.tlef addpath $::env(CONDA_PREFIX)/share/pdk/gf180mcuC/libs.ref/gf180mcu_fd_sc_mcu7t5v0/mag/ addpath $::env(SRAM_LIB_PATH)/mag/ def read $::env(IN_DEF) gds write $::env(IN_DEF).buffer_overflow.gds

Upon GDS write, magic exists with a broken output GDS file: [...] Generating output for cell gf180mcu_fd_sc_mcu7t5v0__dffq_2 Generating output for cell testcase *** buffer overflow detected ***: terminated

To work-around the buffer overflow, I have changed the commands to use the SRAM GDS file rather than the MAG file: #addpath $::env(SRAM_LIB_PATH)/mag/ gds read $::env(SRAM_LIB_PATH)/gds/gf180mcu_fd_ip_sram__sram512x8m8wm1.gds

This version does not trigger a buffer overflow.

/cc @proppy /cc @dhaentz1

RTimothyEdwards commented 1 year ago

@chaufe : How do I run this example?

chaufe commented 1 year ago

@RTimothyEdwards You could either upload the unzipped test case into https://colab.research.google.com, or directly follow this link to load the testcase from my github repo into Google Colab. https://colab.research.google.com/github/chaufe/testcases/blob/main/colab/magic___buffer_overflow_when_merging_in_GF180MCU_SRAM_via_mag_file.ipynb Use Ctrl-F9 or the menu bar to run the all cells of the test case. The third-last cell named Merge GDS (trigger buffer overflow) will contain the buffer overflow. At the left of the colab window, there are options to download files.

RTimothyEdwards commented 1 year ago

@chaufe : Try using the command locking off at the top of your script. The GDS of the GF I/O cells has so many nested subcells that it exceeds the typical Linux ulimit on open file descriptors. Once that happens, random bad things occur.

I added this line in the colab and it seems to have worked.

RTimothyEdwards commented 1 year ago

FYI, the purpose of file locking is for the management of layout on a filesystem where a layout could potentially be edited by multiple people at the same time. Since that doesn't apply to colabs, any colab script running magic may disable file locking. The problem doesn't show up often, though, and I've only seen it occur when using the GF I/O cells (the ulimit is typically set to 1024, so you have to have a lot of cells in the layout---that's unique cells, not instances). I'm not sure why changing the source of the SRAM layout makes the problem go away, but I assume that reading the SRAM GDS is just holding the one GDS file open, while reading the .mag files of the SRAM is holding a lot of both .mag and GDS files open, which on top of the I/O cells being held open, is exceeding the ulimit.

chaufe commented 1 year ago

@RTimothyEdwards: Thanks for your reply. I don't think it is the file locking limit. I've added locking off to the beginning of file def2gds.buffer_overflow.mag. When running magic using this script, I still see *** buffer overflow detected ***: terminated during gds write.

I've also inlcuded ulimit -a, which reports file locks (-x) unlimited.

The failing part of the test case does not read-in any GDS directly, just .mag files via the addpath command. I've updated the test case to more recent builds of magic/gf180mcuC/openroad in-place: https://colab.research.google.com/github/chaufe/testcases/blob/main/colab/magic___buffer_overflow_when_merging_in_GF180MCU_SRAM_via_mag_file.ipynb

RTimothyEdwards commented 1 year ago

@chaufe : How can I download the DEF file generated by the colab?

Poking around on the web, it seems that the "buffer overflow detected" error does relate to open file descriptors. I know that "ulimit -a" on all the systems I've ever used gives a useless answer. I also get "unlimited" but I know there's a hard limit of 1024 set in the kernel and I've never found that any of the standard cited methods to remove that limit actually work.

Possibly what's going on has nothing to do with file locking, but that the GDS writing routine is opening files in a nested routine that is ending up with too many open files at the same time. If so, then I should be able to reproduce that on my local machine reading the DEF file, since I know my home desktop has the ulimit issue. That would give me more flexibility for debugging it and figuring out if I can refactor the code so that it doesn't keep so many file descriptors open. Assuming that's the problem.

chaufe commented 1 year ago

At the left of colab, there is a little "folder" symbol. Clicking on it will turn it orange (see picture) and reveal the file system of the colab env, enabling file download using the right mouse button. colab_files_button The DEF is located here: runs/RUN*/results/placement/testcase.def Here is copy of testcase.def.gz from my run in colab.

RTimothyEdwards commented 1 year ago

@chaufe : Solved. There was one place in the code that the flag set by "locking disable" was not checked, so files were not closed, causing the open file descriptor overflow. I fixed the problem in magic version 8.3.415 (on opencircuitdesign.com; github will get updated in 12 hours).

Thank you for the test case!

BTW, I recommend that you run the command cif *hier write disable before running gds write. In the case of a top-level assembly from standard cells and standard I/O, there should be no unexpected hierarchical interactions between cells (because all cells have been purposefully designed to abut cleanly). Disabling magic's automatic check for hierarchical interactions will greatly speed up the writing of the GDS file.

chaufe commented 1 year ago

Thanks for the fix. The buffer overflow is also gone at may side, so I close the issue.

Regarding the cif *hier write disable, I prefer not having this flag set just to be on the safe side. (Assume a faulty stdcell design, or a stdcell library with cell spacing constraints that have not been satisfied by the placer.)

RTimothyEdwards commented 1 year ago

That's certainly the safe assumption. For large digital designs, though, it can add a hefty overhead to the GDS output processing.