calyxir / calyx

Intermediate Language (IL) for Hardware Accelerator Generators
https://calyxir.org
MIT License
492 stars 50 forks source link

Verilog backend overflows on large programs #1273

Closed susan-garry closed 1 year ago

susan-garry commented 1 year ago

As in the title, fud e --to interpreter-out -s verilog.data DRB1-3123.data DRB1-3123.futil produces the expected output, while fud e --to dat --through verilog -s verilog.data DRB1-3123.data DRB1-3123.futil and fud e --to dat --through icarus-verilog -s verilog.data DRB1-3123.data DRB1-3123.futil produce nearly identical error messages:

[fud] ERROR: `/scratch/susan/calyx/target/debug/futil -l /scratch/susan/calyx -b verilog' failed:
---
> [fud] ERROR: `/scratch/susan/calyx/target/debug/futil -l /scratch/susan/calyx -b verilog --disable-init --disable-verify' failed:

The details of verilator's error message is in error.txt. I also uploaded the input files, in case anyone is interested in duplicating the error.

error.txt DRB1-3123.data.txt DRB1-3123.futil.txt

rachitnigam commented 1 year ago

Thanks for the bug report! I think it might be guard generator that overflowing so I can take a look. However, we should also chat about the generated code. Everything is sequential so I'm not sure why there are so many groups and invoke statements are generated. The likely cause of the overflow is generating extremely large guards which will result in extremely bad resource usage

rachitnigam commented 1 year ago

Also, @calebmkim, got a new stress test for sharing pass for you (using DRB1-3123.futil.txt):

% ./target/release/futil -b verilog --log info pange.futil > /dev/null 
[INFO  calyx::pass_manager] well-formed: 111ms
[INFO  calyx::pass_manager] papercut: 49ms
[INFO  calyx::pass_manager] canonicalize: 55ms
[INFO  calyx::pass_manager] compile-sync: 5ms
[INFO  calyx::pass_manager] group2seq: 28ms
[INFO  calyx::pass_manager] group2invoke: 22ms
[INFO  calyx::pass_manager] inline: 44ms
[INFO  calyx::pass_manager] comb-prop: 43ms
[INFO  calyx::pass_manager] compile-ref: 37ms
[INFO  calyx::pass_manager] infer-share: 3ms
[INFO  calyx::pass_manager] cell-share: 153387ms // <- 2.5 minutes!
[INFO  calyx::pass_manager] remove-comb-groups: 0ms
[INFO  calyx::pass_manager] infer-static-timing: 330ms
[INFO  calyx::pass_manager] compile-invoke: 94ms
[INFO  calyx::pass_manager] merge-static-par: 68ms
[INFO  calyx::pass_manager] static-par-conv: 1630ms
[INFO  calyx::pass_manager] dead-group-removal: 40ms
[INFO  calyx::pass_manager] collapse-control: 16ms
[INFO  calyx::pass_manager] tdcc: 6399ms
[INFO  calyx::pass_manager] dead-group-removal: 11ms
[INFO  calyx::pass_manager] comb-prop: 9997ms              
[INFO  calyx::pass_manager] dead-cell-removal: 9044ms
[INFO  calyx::pass_manager] go-insertion: 33ms
[INFO  calyx::pass_manager] wire-inliner: 1789ms
[INFO  calyx::pass_manager] clk-insertion: 66ms
[INFO  calyx::pass_manager] reset-insertion: 54ms
[INFO  calyx::pass_manager] merge-assigns: 602ms
rachitnigam commented 1 year ago

Okay, according to @susan-garry, y'all previously had programs with the same number of PEs and memories and the only change is generating invoke statements which is curious. Some things to note:

Now onto sources of overflow:

rachitnigam commented 1 year ago

To get the overflow quicker, we can run the compiler with optimizations disabled using -p no-opt:

% ./target/release/futil -b verilog --log info -p no-opt pange.futil > /dev/null
[INFO  calyx::pass_manager] well-formed: 98ms
[INFO  calyx::pass_manager] papercut: 48ms
[INFO  calyx::pass_manager] canonicalize: 65ms
[INFO  calyx::pass_manager] compile-sync: 5ms
[INFO  calyx::pass_manager] compile-ref: 41ms
[INFO  calyx::pass_manager] remove-comb-groups: 0ms
[INFO  calyx::pass_manager] compile-invoke: 86ms
[INFO  calyx::pass_manager] tdcc: 6388ms
[INFO  calyx::pass_manager] go-insertion: 61ms
[INFO  calyx::pass_manager] wire-inliner: 2021ms
[INFO  calyx::pass_manager] clk-insertion: 93ms
[INFO  calyx::pass_manager] reset-insertion: 92ms
[INFO  calyx::pass_manager] merge-assigns: 979ms

thread 'main' has overflowed its stack
fatal runtime error: stack overflow
zsh: abort      ./target/release/futil -b verilog --log info -p no-opt pange.futil > /dev/nul

Running the compiler to just print out the Calyx program after compilation doesn't overflow:

% ./target/release/futil --log info -p no-opt pange.futil > out.futil
[INFO  calyx::pass_manager] well-formed: 111ms
[INFO  calyx::pass_manager] papercut: 47ms
[INFO  calyx::pass_manager] canonicalize: 118ms
[INFO  calyx::pass_manager] compile-sync: 13ms
[INFO  calyx::pass_manager] compile-ref: 42ms
[INFO  calyx::pass_manager] remove-comb-groups: 1ms
[INFO  calyx::pass_manager] compile-invoke: 89ms
[INFO  calyx::pass_manager] tdcc: 6474ms
[INFO  calyx::pass_manager] go-insertion: 58ms
[INFO  calyx::pass_manager] wire-inliner: 2067ms
[INFO  calyx::pass_manager] clk-insertion: 83ms
[INFO  calyx::pass_manager] reset-insertion: 87ms
[INFO  calyx::pass_manager] merge-assigns: 957ms
rachitnigam commented 1 year ago

Running with lldb, I get this:

% lldb -- ./target/release/futil -b verilog --log info -p no-opt pange.futil   
...
Process 69743 stopped
* thread #1, name = 'main', queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x7ff7bf700ff8)
    frame #0: 0x0000000100295824 futil`core::ptr::drop_in_place$LT$pretty..Doc$LT$pretty..RcDoc$GT$$GT$::hc5ff1d9bc7dd2c56 + 132
futil`core::ptr::drop_in_place$LT$pretty..Doc$LT$pretty..RcDoc$GT$$GT$::hc5ff1d9bc7dd2c56:
->  0x100295824 <+132>: callq  0x1002957a0               ; <+0>
    0x100295829 <+137>: movq   0x8(%rbx), %rdi
    0x10029582d <+141>: callq  0x1002d6778               ; symbol stub for: free
    0x100295832 <+146>: movq   0x10(%rbx), %rdi
Target 0: (futil) stopped.
(lldb) 

RcDoc is the library vast, our verilog backend, uses to print out verilog strings. The EXC_BAD_ACCESS is concerning because it seems to imply a bad memory access which should not happen in Rust? RcDoc probably uses unsafe code which might be misbehaving? Still, the fact what we're getting to pretty printing means that its probably something wrong in vast or RcDoc.

Also, summoning @EclecticGriffin's Rust powers in case they have better ideas of what could be going wrong

rachitnigam commented 1 year ago

Building with AddressSanitizer:

AddressSanitizer:DEADLYSIGNAL
=================================================================
==72654==ERROR: AddressSanitizer: stack-overflow on address 0x7ff7b67509b8 (pc 0x000109f797df bp 0x7ff7b67511f0 sp 0x7ff7b67509c0 T0)
    #0 0x109f797df in __asan_memcpy+0x18f (librustc-nightly_rt.asan.dylib:x86_64+0x467df) (BuildId: 364f88c707f13921b85413add38b47d4240000001000000000070a0000010c00)
    #1 0x1097cf2b8 in arrayvec::array_string::ArrayString$LT$A$GT$::new::he839dde011b09996+0xc8 (futil:x86_64+0x10081f2b8) (BuildId: 8c58d4e6d80a312cb8d786fa62c5165932000000200000000100000000000d00)
    #2 0x1097ad951 in pretty::RcDoc$LT$A$GT$::as_string::h7398701d6df83bc0+0x121 (futil:x86_64+0x1007fd951) (BuildId: 8c58d4e6d80a312cb8d786fa62c5165932000000200000000100000000000d00)
    #3 0x1097b50b2 in vast::subset::pretty_print::_$LT$impl$u20$vast..util..pretty_print..PrettyPrint$u20$for$u20$vast..subset..ast..Expr$GT$::to_doc::h91b5fcf4d36ccf10+0x11d2 (futil:x86_64+0x1008050b2) (BuildId: 8c58d4e6d80a312cb8d786fa62c5165932000000200000000100000000000d00)
    #4 0x1097b5ac1 in vast::subset::pretty_print::_$LT$impl$u20$vast..util..pretty_print..PrettyPrint$u20$for$u20$vast..subset..ast..Expr$GT$::to_doc::h91b5fcf4d36ccf10+0x1be1 (futil:x86_64+0x100805ac1) (BuildId: 8c58d4e6d80a312cb8d786fa62c5165932000000200000000100000000000d00)
    #5 0x1097b624c in vast::subset::pretty_print::_$LT$impl$u20$vast..util..pretty_print..PrettyPrint$u20$for$u20$vast..subset..ast..Expr$GT$::to_doc::h91b5fcf4d36ccf10+0x236c (futil:x86_64+0x10080624c) (BuildId: 8c58d4e6d80a312cb8d786fa62c5165932000000200000000100000000000d00)
    #6 0x1097b624c in vast::subset::pretty_print::_$LT$impl$u20$vast..util..pretty_print..PrettyPrint$u20$for$u20$vast..subset..ast..Expr$GT$::to_doc::h91b5fcf4d36ccf10+0x236c (futil:x86_64+0x10080624c) (BuildId: 8c58d4e6d80a312cb8d786fa62c5165932000000200000000100000000000d00)
    #7 0x1097b624c in vast::subset::pretty_print::_$LT$impl$u20$vast..util..pretty_print..PrettyPrint$u20$for$u20$vast..subset..ast..Expr$GT$::to_doc::h91b5fcf4d36ccf10+0x236c (futil:x86_64+0x10080624c) (BuildId: 8c58d4e6d80a312cb8d786fa62c5165932000000200000000100000000000d00)
    #8 0x1097b624c in vast::subset::pretty_print::_$LT$impl$u20$vast..util..pretty_print..PrettyPrint$u20$for$u20$vast..subset..ast..Expr$GT$::to_doc::h91b5fcf4d36ccf10+0x236c (futil:x86_64+0x10080624c) (BuildId: 8c58d4e6d80a312cb8d786fa62c5165932000000200000000100000000000d00)
... 
  #254 0x1097b624c in vast::subset::pretty_print::_$LT$impl$u20$vast..util..pretty_print..PrettyPrint$u20$for$u20$vast..subset..ast..Expr$GT$::to_doc::h91b5fcf4d36ccf10+0x236c (futil:x86_64+0x10080624c) (BuildId: 8c58d4e6d80a312cb8d786fa62c5165932000000200000000100000000000d00)

Looks like a bug in the pretty printing implementation in vast?

rachitnigam commented 1 year ago

Okay, I've reduced the test case and it seems to come from...the cells? You don't even the control program. Just a lot of cells marked with @external

rachitnigam commented 1 year ago

Here is the reduced file that still overflows. Don't have more cycles today but the problem is probably fixable in verilog.rs

min.futil.txt

sampsyo commented 1 year ago

Nice work isolating the "big cells list" test case, @rachitnigam! Given that it's a segfault, I share the intuition that it could just be a stack overflow in VAST… I'll see if I have a moment to break out lldb as well.

calebmkim commented 1 year ago

Just to add on to this: I am getting an identical error when trying to get resource estimates when I fully inline some of the larger Calyx neural networks. Based on this thread, I'm guessing that when I fully inline everything, it's adding a bunch of new cells to main causing an overflow when we try to go from compiled Calyx -> Verilog.
Should I try to fix this bug?

rachitnigam commented 1 year ago

The interesting thing is that removing the external attribute from cells makes the compilation work. I’d have to see if that’s really something or just a consequence of making the generated code smaller

rachitnigam commented 1 year ago

Okay, I've debugged this some more and the problem is pretty wild–VAST generates really large modules which the pretty printing module turns into string. Everything is fine till the pretty printing function goes to return the value. The function runs drop to drop the struct allocated by VAST's representation of the documentation and blows the stack during the drop process

sampsyo commented 1 year ago

That is absolutely bonkers 👏

susan-garry commented 1 year ago

I tried running fud e --to dat --through verilog -s verilog.data DRB1.data DRB1.futil (after updating calyx), but I still get a similar overflow error. The solution seems to be to rewrite the program to be smaller, but I want to check that I'm not missing something since @rachitnigam mentioned being able to compile this program to verilog in #1280.

rachitnigam commented 1 year ago

@susan-garry are you building and running the compiler in release mode. You need to build the compiler using release mode:

cargo build --release

And then change fud to use the release binary:

fud e stages.futil.exec "<calyx repo>/target/release/futil

It should not get a overflow error anymore. If it does, open a new issue please