Eliminate large `.expect` files

rachitnigam commented 3 months ago

The general philosophy around checking in golden files requires them to be inspected when changes occur. It is not clear to me that files larger than a reasonably small bound are ever realistically checked. This leads to a false sense of security because we technically have "tests" even though we're not really making sure they are correct.

I ran the following command to find all the .expect files that are longer than 300 files:

fd .expect | xargs -I _ wc -l _ | awk '{ if ( $1 > 300 ) print $2,$1 }' | sort -nk2

A couple of recommendations:

If a particular test does not need to be large, let's bring it under the 300 lines threshold
If a particular test does need to be longer, let's figure out why
1. If it the case that we're just tracking golden outputs (like the output of a computation) and we would never want it to change, let's add a prefix like .freeze.expect to indicate that this output will never change after being committed to the repo.
2. If the output format is particularly verbose, let's build some tooling to compress and track only the relevant part of the output. For example, if the Queues testing infrastructure can use a tool to generate a shorter representation of the output, let's build that tool.
Let's write a GH bot to enforce this limit on any new .expect files

Backend

tests/backend/verilog/memory-with-external-attribute.expect 628

Frontends

tests/frontend/ntt-pipeline/ntt-4.expect 306
tests/frontend/ntt-pipeline/ntt-4-reduced-2.expect 312
tests/frontend/relay/batch_matmul.expect 321
tests/frontend/relay/max_pool2d.expect 341
tests/frontend/relay/conv2d.expect 423
tests/frontend/relay/softmax.expect 696

YXI and Xilinx

(@nathanielnrn)

tests/xilinx/language-tutorial-iterate.expect 687
tests/xilinx/gen-verilog/language-tutorial-iterate.expect 686
tests/xilinx/gen-verilog/dot-product.expect 1426
tests/xilinx/gen-verilog/vectorized-add.expect 1426

yxi/tests/axi/read-compute-write/seq-vec-add.expect 395
yxi/tests/axi/read-compute-write/seq-mem-vec-add.expect 475
yxi/tests/axi/dynamic/dyn-vec-add.expect 772
yxi/tests/axi/dynamic/dyn-mem-vec-add.expect 848
yxi/tests/axi/read-compute-write/seq-mem-vec-add-axi-wrapped.expect 9119
yxi/tests/axi/dynamic/dyn-mem-vec-add-axi-wrapped.expect 11012

Interpreter

(@EclecticGriffin)

interp/tests/benchmarks/polybench/linear-algebra-symm.expect 318
interp/tests/benchmarks/polybench/linear-algebra-gramschmidt.expect 400
interp/tests/benchmarks/polybench/linear-algebra-2mm.expect 418
interp/tests/benchmarks/polybench/linear-algebra-syr2k.expect 422
interp/tests/benchmarks/polybench/linear-algebra-3mm.expect 576
interp/tests/benchmarks/polybench/linear-algebra-doitgen.expect 752

Queues

(@anshumanmohan)

calyx-py/test/correctness/queues/binheap/fifo.expect 60008
calyx-py/test/correctness/queues/binheap/pifo.expect 60008
calyx-py/test/correctness/queues/fifo.expect 60008
calyx-py/test/correctness/queues/pifo.expect 60008
calyx-py/test/correctness/queues/pifo_tree.expect 60008
calyx-py/test/correctness/queues/sdn.expect 60008
calyx-py/test/correctness/queues/strict_and_rr_queues/rr_queues/rr_queue_2flows.expect 60008
calyx-py/test/correctness/queues/strict_and_rr_queues/rr_queues/rr_queue_3flows.expect 60008
calyx-py/test/correctness/queues/strict_and_rr_queues/rr_queues/rr_queue_4flows.expect 60008
calyx-py/test/correctness/queues/strict_and_rr_queues/rr_queues/rr_queue_5flows.expect 60008
calyx-py/test/correctness/queues/strict_and_rr_queues/rr_queues/rr_queue_6flows.expect 60008
calyx-py/test/correctness/queues/strict_and_rr_queues/rr_queues/rr_queue_7flows.expect 60008
calyx-py/test/correctness/queues/strict_and_rr_queues/strict_queues/strict_2flows.expect 60008
calyx-py/test/correctness/queues/strict_and_rr_queues/strict_queues/strict_3flows.expect 60008
calyx-py/test/correctness/queues/strict_and_rr_queues/strict_queues/strict_4flows.expect 60008
calyx-py/test/correctness/queues/strict_and_rr_queues/strict_queues/strict_5flows.expect 60008
calyx-py/test/correctness/queues/strict_and_rr_queues/strict_queues/strict_6flows.expect 60008
calyx-py/test/correctness/queues/binheap/stable_binheap.expect 80010
calyx-py/test/correctness/queues/nwc_simple.expect 100012
calyx-py/test/correctness/queues/pcq.expect 100012
calyx-py/test/correctness/queues/pieo.expect 100012

anshumanmohan commented 3 months ago

Thanks for flagging this! The queue team is aware of our embarrassingly large files, and we hope to clean this up as part of #2191! Hopefully we'll massage this away in a few weeks :)

ethanuppal commented 3 months ago

@EclecticGriffin any update on the usability of cider2 as a component interpreter programmatically?

EclecticGriffin commented 3 months ago

@EclecticGriffin any update on the usability of cider2 as a component interpreter programmatically?

I don't see what that has to do with this issue? Unless I'm missing something?

EclecticGriffin commented 3 months ago

wrt to the .expect files all of these are from the polybench tests and so I don't see those changing, meaning we can hit them with the .freeze

ethanuppal commented 3 months ago

@EclecticGriffin any update on the usability of cider2 as a component interpreter programmatically?

I don't see what that has to do with this issue? Unless I'm missing something?

The testbench I'm writing depends on cycle-level interpretation of calyx components, and I think it eliminates the need for expect files when testing, for example, a data structure (and it's more important to test behavioral aspects of the code then external memory state after completion).

rachitnigam commented 3 months ago

wrt to the .expect files all of these are from the polybench tests and so I don't see those changing, meaning we can hit them with the .freeze

Perfect! Let's do that @EclecticGriffin!

The testbench I'm writing depends on cycle-level interpretation of calyx components, and I think it eliminates the need for expect files when testing, for example, a data structure (and it's more important to test behavioral aspects of the code then external memory state after completion).

@ethanuppal this sounds cool and potentially useful for the queues team. However, I think it is super important to get buy-in from folks and making sure it actually serves the purpose that they want it to do.

rachitnigam commented 2 months ago

@nathanielnrn do you have any thoughts on how we can remove the large AXI expect tests?

ethanuppal commented 2 months ago

Update:

Queues team said rust test interface would be useful
Still having trouble using cider2 as a library

nathanielnrn commented 2 months ago

I think the best solution for yxi/AXI tests relies on implementing https://github.com/rachitnigam/runt/issues/20. After that I think it's enough to test only for .yxi output and functional correctness with cocotb. The problem is that currently fud2 doesn't quite handle the path finding correctly to go from calyx -> axi-wrapped-calyx -> ... cocotb. Without the correct path finding/runt functionality nothing is enforcing the axi-wrapped-calyx to stay in sync with the generator without the tests that produce big .expect files.

anshumanmohan commented 2 months ago

I'd like to say a little more about the queues team wrt the rust test interface. Basically I think the rust test interface sounds neat, but I don't want to devote lots of resources to porting our testing over right now. Porting over to a rust test interface is a medium-term goal for us.

At present I have asked my team to generate the enormous .expect and .data files that we presently do, test against those, and then discard them. So it all goes down as usual, it just doesn't get checked in.

ethanuppal commented 2 months ago

@anshumanmohan sounds like a good plan!

calyxir / calyx