Unexpectedly Poor Performance on Large FPGA Configuration

I'm currently seeing slightly worse performance from a 16 PE (16 EPB) build of DANA than with a 3 PE (4EPB) build. This is across different networks including networks that should have a lot of work (e.g., xor-sigmoid-128o). I've only tested this for learning test cases. The scaling here should be roughly linear and align with the old results showing this. This makes no sense and is either some bug in how the cycles and performance are being computed or this is some unintentional artifact of the X-FILES/DANA split.

This is arguably an indication that we need to be doing performance monitoring with the regression testing.

bu-icsg / dana

Unexpectedly Poor Performance on Large FPGA Configuration #27