Open jeffng-or opened 1 week ago
@jeffng-or Apparently it's not mpl2 itself that is blowing up. During clustering, we call par (TritonPart) to partition big flat clusters i.e., big clusters made of only leaf macros/std cells. Based on your log, the segfault is happening inside par.
@jeffng-or Apparently it's not mpl2 itself that is blowing up. During clustering, we call par (TritonPart) to partition big flat clusters i.e., big clusters made of only leaf macros/std cells. Based on your log, the segfault is happening inside par.
Sure, makes sense. The key point is that breakLargeFlatCluster recurses down 11100 frames (I think I cut the stack trace file off one level too soon, so my bad on that). The fact that we down effectively infinitely will eventually cause a failure somewhere and it happens to be in par.
@AcKoucher the end of the stack is in par but most of the stack is in mpl2. I think the problem is the recursion in breakLargeFlatCluster. How many parts are we trying to break this cluster down into? I suspect something is off in the cluster size.
@maliberty I see. I'll investigate.
If that much splitting is necessary then you can write it non-recursively.
Apparently TritonPart is doing a terrible job when trying partitioning (ftq)_glue_logic
[DEBUG MPL-multilevel_autoclustering] Breaking flat cluster (ftq)_glue_logic with TritonPart
[DEBUG MPL-multilevel_autoclustering] Setting Cluster Metrics for (ftq)_glue_logic_0: Num Macros: 1 Num Std Cells: 31578
[DEBUG MPL-multilevel_autoclustering] Setting Cluster Metrics for (ftq)_glue_logic_1: Num Macros: 0 Num Std Cells: 0
[DEBUG MPL-multilevel_autoclustering] Breaking flat cluster (ftq)_glue_logic_0 with TritonPart
[DEBUG MPL-multilevel_autoclustering] Setting Cluster Metrics for (ftq)_glue_logic_0_0: Num Macros: 1 Num Std Cells: 31578
[DEBUG MPL-multilevel_autoclustering] Setting Cluster Metrics for (ftq)_glue_logic_0_1: Num Macros: 0 Num Std Cells: 0
[DEBUG MPL-multilevel_autoclustering] Breaking flat cluster (ftq)_glue_logic_0_0 with TritonPart
[DEBUG MPL-multilevel_autoclustering] Setting Cluster Metrics for (ftq)_glue_logic_0_0_0: Num Macros: 1 Num Std Cells: 31578
[DEBUG MPL-multilevel_autoclustering] Setting Cluster Metrics for (ftq)_glue_logic_0_0_1: Num Macros: 0 Num Std Cells: 0
[DEBUG MPL-multilevel_autoclustering] Breaking flat cluster (ftq)_glue_logic_0_0_0 with TritonPart
[DEBUG MPL-multilevel_autoclustering] Setting Cluster Metrics for (ftq)_glue_logic_0_0_0_0: Num Macros: 1 Num Std Cells: 31578
[DEBUG MPL-multilevel_autoclustering] Setting Cluster Metrics for (ftq)_glue_logic_0_0_0_1: Num Macros: 0 Num Std Cells: 0
[DEBUG MPL-multilevel_autoclustering] Breaking flat cluster (ftq)_glue_logic_0_0_0_0 with TritonPart
[DEBUG MPL-multilevel_autoclustering] Setting Cluster Metrics for (ftq)_glue_logic_0_0_0_0_0: Num Macros: 1 Num Std Cells: 31198
[DEBUG MPL-multilevel_autoclustering] Setting Cluster Metrics for (ftq)_glue_logic_0_0_0_0_1: Num Macros: 0 Num Std Cells: 380
[DEBUG MPL-multilevel_autoclustering] Breaking flat cluster (ftq)_glue_logic_0_0_0_0_0 with TritonPart
[DEBUG MPL-multilevel_autoclustering] Setting Cluster Metrics for (ftq)_glue_logic_0_0_0_0_0_0: Num Macros: 1 Num Std Cells: 31198
[DEBUG MPL-multilevel_autoclustering] Setting Cluster Metrics for (ftq)_glue_logic_0_0_0_0_0_1: Num Macros: 0 Num Std Cells: 0
[DEBUG MPL-multilevel_autoclustering] Breaking flat cluster (ftq)_glue_logic_0_0_0_0_0_0 with TritonPart
[DEBUG MPL-multilevel_autoclustering] Setting Cluster Metrics for (ftq)_glue_logic_0_0_0_0_0_0_0: Num Macros: 1 Num Std Cells: 31198
[DEBUG MPL-multilevel_autoclustering] Setting Cluster Metrics for (ftq)_glue_logic_0_0_0_0_0_0_1: Num Macros: 0 Num Std Cells: 0
[DEBUG MPL-multilevel_autoclustering] Breaking flat cluster (ftq)_glue_logic_0_0_0_0_0_0_0 with TritonPart
[DEBUG MPL-multilevel_autoclustering] Setting Cluster Metrics for (ftq)_glue_logic_0_0_0_0_0_0_0_0: Num Macros: 0 Num Std Cells: 0
[DEBUG MPL-multilevel_autoclustering] Setting Cluster Metrics for (ftq)_glue_logic_0_0_0_0_0_0_0_1: Num Macros: 1 Num Std Cells: 31198
[DEBUG MPL-multilevel_autoclustering] Breaking flat cluster (ftq)_glue_logic_0_0_0_0_0_0_0_1 with TritonPart
[DEBUG MPL-multilevel_autoclustering] Setting Cluster Metrics for (ftq)_glue_logic_0_0_0_0_0_0_0_1_0: Num Macros: 1 Num Std Cells: 31198
[DEBUG MPL-multilevel_autoclustering] Setting Cluster Metrics for (ftq)_glue_logic_0_0_0_0_0_0_0_1_1: Num Macros: 0 Num Std Cells: 0
[sinks into oblivion ...]
@maliberty I'm not sure how to proceed here. Should mpl2 reject the result and take care of splitting the cluster if the partitions generated by TritonPart are not good?
I think TP should be fixed.
Describe the bug
The macro placer runs for 14h before seg faulting on BoomFrontend, which is a sub-module of BoomTile. Note that the segfault isn't seen in the full BoomTile run, which runs for about 1h.
I've re-run the job in GDB and mpl2 is infinitely recursing itself into oblivion. Here's a snippet of the stack trace:
The full-ish stack trace can be found at: https://drive.google.com/file/d/10MMydy8f761RPeXXE5FKgIVFDAtlWwCn/view?usp=sharing
The tarball can be found at: https://drive.google.com/file/d/1PH8jZAREhRn4NIVryR7pes3sKNGSIBqs/view?usp=sharing
Expected Behavior
Successful mpl2 run without a seg fault and running less than 1h
Environment
To Reproduce
Relevant log output
No response
Screenshots
No response
Additional Context
No response