KLayout / klayout

KLayout Main Sources
http://www.klayout.org
GNU General Public License v3.0
770 stars 198 forks source link

High memory footprint during DRC #1189

Closed gatecat closed 1 year ago

gatecat commented 1 year ago

Test design (run run.sh): drc_memory_gf180.zip

Memory usage gradually increases to 60+GB over the course of the checks, despite regular use of forget.

Example log points where memory use significantly increases, I can provide a full log too if needed:

2022-11-14 13:10:07 +0000: Memory Usage (29017764K) : Executing rule CO.1
2022-11-14 13:10:51 +0000: Memory Usage (30691296K) : Executing rule CO.2a
2022-11-14 13:12:22 +0000: Memory Usage (32822268K) : Executing rule CO.2b
2022-11-14 13:12:23 +0000: Memory Usage (32822268K) : Executing rule CO.3
2022-11-14 13:13:22 +0000: Memory Usage (35574772K) : Executing rule CO.4
2022-11-14 13:14:48 +0000: Memory Usage (38784968K) : Executing rule CO.5a
2022-11-14 13:15:05 +0000: Memory Usage (38784968K) : Executing rule CO.5b
2022-11-14 13:15:06 +0000: Memory Usage (38784968K) : Executing rule CO.6
2022-11-14 13:21:55 +0000: Memory Usage (40266620K) : Executing rule CO.6a
2022-11-14 13:30:49 +0000: Memory Usage (47674980K) : Executing rule CO.6b
2022-11-14 13:31:19 +0000: Memory Usage (47937128K) : Executing rule CO.7
2022-11-14 13:31:55 +0000: Memory Usage (49046828K) : Executing rule CO.8
...
2022-11-14 13:45:28 +0000: Memory Usage (57805780K) : Executing rule O.CO.7
2022-11-14 13:46:00 +0000: Memory Usage (60417476K) : Executing rule O.PL.ORT
2022-11-14 13:46:45 +0000: Memory Usage (60417476K) : Executing rule EF.01
....
2022-11-14 13:47:10 +0000: Memory Usage (60417476K) : Executing rule EF.17
2022-11-14 13:47:10 +0000: Memory Usage (60417476K) : Executing rule EF.18
2022-11-14 14:04:08 +0000: Memory Usage (61968048K) : Executing rule EF.19
2022-11-14 14:07:22 +0000: Memory Usage (62544228K) : Executing rule EF.20
2022-11-14 14:07:30 +0000: Memory Usage (62544228K) : Executing rule EF.21
2022-11-14 14:07:31 +0000: Memory Usage (62544228K) : Executing rule EF.22a

Happy to accept that this is just the way things are or it's a deck issue, but certain checks causing a large, permanent increase in memory footprint does make me wonder if there could be a memory leak worth investigating, or other issues where large caches etc could be freed to save memory?

klayoutmatthias commented 1 year ago

Hi @gatecat,

No, not don't need to accept that :)

I'll check that. I see that basically you're not using tiled or deep mode, right?

Is there a reason for not doing so? Do you see some effects that prevent you to use any other mode?

Problem is that with flat mode, huge chunks of contiguous memory may be allocated (right now, the allocation is kind of stupid). This sometimes causes memory fragmentation with the effect you see. Tiled mode for example may avoid this.

But I'll test myself.

Matthias

klayoutmatthias commented 1 year ago

Ok, first feedback: tiled mode does not make a huge difference ... debugging further.

klayoutmatthias commented 1 year ago

Found the problem, but the solution needs a little investigation.

Problem is that there is no leak, but freeing of memory for the database objects is postponed. This is the effect of a fix introduced to avoid Ruby/C++ interactions during GUI design with Ruby Qt classes. This prevents immediate garbage collection of intermediate products such as the layers generated during operation chains like these:

lvpwell.inside(res_mk).not_inside(dnwell).overlapping(dualgate)

"forget" works, but it impractical to use it for every intermediate result.

I will try to provide a patch.

Best regards,

Matthias

klayoutmatthias commented 1 year ago

I have published a PR that brings memory down to ~4..5GB with some peaking to ~7GB: https://github.com/KLayout/klayout/pull/1193

QuantamHD commented 1 year ago

Thanks for the fix @klayoutmatthias is there anything we could be doing in the GF180MCU scripts to improve performance?

https://github.com/google/globalfoundries-pdk-libs-gf180mcu_fd_pr/tree/74e4ec59b55bcf5be2f153abff8519d15ebe21fa/rules/klayout/drc

klayoutmatthias commented 1 year ago

Update: something still killed my DRC run in the MDN.3b step, but I have not debugged it yet. Still up to this step, memory was pretty much constant at around 5GB.

klayoutmatthias commented 1 year ago

@QuantamHD Yes, definitely the script can be improved. First thing always, the bottlenecks have to be identified.

First, the master branch contains a number of performance improvements.

For example, yesterday I ran @gatecat's example in deep mode and it showed quite promising performance up to CO.6 (9 mins at 1.5GB memory with 4 cores). Looking at the CO.6 implementation I think this can be simplified considerably and maybe some of the functions used (e.g. "not_in") may not be deep-enabled yet. A more detailed analysis will show what is the problem.

Same happened to me in tiled mode (500µm tiles, 4 cores) - with the patch the script runs smoothly, but the memory peaked in MDN.3b and my process got killed. I think it is the long-range "width" operation (20µm distance) which has to collect a unusual high number of neighbouring edges and generates a trillion error markers before feeding their edge components into "not". "width" is much more efficient on polygons in general as width is an operation that can be computed locally on merged polygons. So I think the strategy should be first to try rewriting the rule to polygons - at least the "width" part.

I general I need so say that the DRC deck very readable and nicely done, so good job!

Matthias

gatecat commented 1 year ago

Thank you very much for looking into this!

atorkmabrains commented 1 year ago

@klayoutmatthias Please send me any comments on this DRC rule deck. Mabrains was responsible of developing this rule deck.

I don't recommend using tiled version as it gives false error BTW.

Our linkedin link: https://www.linkedin.com/company/mabrains/?viewAsMember=true

Website: https://mabrains.com

klayoutmatthias commented 1 year ago

I think we pretty much dealt with this problem elsewhere, so I will close this ticket for now.