Hi,
this PR primarily adds support and documentation for the three QONNX ops:
BipolarQuant
Quant
Trunc
Furthermore the PR adds basic ONNX transformations, which are required for the QONNX ingestion in FINN.
ExtractBiasFromConv
Extracts the (optional) Bias from a Conv node and inserts it behind Conv node as an Add node.
GemmToMatMul
Converts Gemm nodes into a MatMul and an Add node.
remove_node_and_rewire
This got converted to a publicly facing function and I added support for multiple successors.
RemoveIdentityOps
Got moved over from FINN.
RemoveEmptyPadding
Removes padding nodes, which don't pad. This node is inserted by the BrevitasONNXManager during the export of global average layers.
A few tests got added as well:
test_brevitas_quant_onnx_export_and_exec
This tests the execution of QONNX ops. However it should likely move into FINN, since the test intdroduces a dependency on PyTorch for the testing.
test_remove_identity_ops
Got moved over from FINN.
Open ToDos:
[x] Move test_brevitas_quant_onnx_export_and_exec to FINN to remove the PyTorch dependency for QONNX op test.
[x] Rework MultiThreshold datatype inference.
Issue with current implementation: The node currently applies the data type of the out_dtype field to the output tensor. This leads to rounding errors during execution if the scale and/or bias are not scalar.
Suggested change: Keep the out_dtype as is, since it may be used by down-stream tools to read the internal datatype of the MultiThreshold node, then infer the output datatype from the out_dtype, scale and bias on demand.
Hi, this PR primarily adds support and documentation for the three QONNX ops:
BipolarQuant
Quant
Trunc
Furthermore the PR adds basic ONNX transformations, which are required for the QONNX ingestion in FINN.
ExtractBiasFromConv
GemmToMatMul
remove_node_and_rewire
RemoveIdentityOps
RemoveEmptyPadding
A few tests got added as well:
test_brevitas_quant_onnx_export_and_exec
test_remove_identity_ops
Open ToDos:
test_brevitas_quant_onnx_export_and_exec
to FINN to remove the PyTorch dependency for QONNX op test.out_dtype
field to the output tensor. This leads to rounding errors during execution if the scale and/or bias are not scalar.out_dtype
as is, since it may be used by down-stream tools to read the internal datatype of the MultiThreshold node, then infer the output datatype from theout_dtype
, scale and bias on demand.