This PR removes the assumption that the leftmost lane of a low-precision soft vector compute on the wide DSP datapath would. The design now also generates a high lane extension for the leftmost lane in the fabric as needed.
Additionally, the cross-SIMD reduction in an adder tree is now pipelined so as to reduce the challenge for timing closure in high-fanin designs.
This PR removes the assumption that the leftmost lane of a low-precision soft vector compute on the wide DSP datapath would. The design now also generates a high lane extension for the leftmost lane in the fabric as needed. Additionally, the cross-SIMD reduction in an adder tree is now pipelined so as to reduce the challenge for timing closure in high-fanin designs.