Closed tafk7 closed 1 year ago
Hi @Rellek72 , Can you provide code and model files?
Of course, example below.
I've also been discussing this issue on the Xilinx forums here.
Hi @Rellek72 , The root cause of the issue is that xcompiler does not provide support for deploying tf.linalg.matmul to DPU.
@zhenzhen-AMD Could you please update the documentation accordingly? Matrix multiplies are listed as supported here and the user guide explicitly refers to matmul as a supported operation for tensorflow.
Hi @Rellek72 ,
Thank you for sharing your model with us. After reviewing, it seems that your model doesn't satisfy the necessary criteria for converting a matmul operation into conv2d.
For a successful conversion from matmul to conv2d, one of the inputs to the matmul operation must be the input tensor (inputs), while the other should be the weight tensor (weights). However, in the model you've provided, both inputs to the matmul operation are input tensors. As a result, it cannot be deployed to DPU.
This is aligned with what's outlined in the documentation, as shown below:
Closing this for now. Please re-open if there are any other concerns. Thanks.
I greatly appreciate your help with this and assistance finding workarounds on the Xilinx forums. However, I would highly recommend that you amend the documentation to make it explicitly clear that matmuls with two dynamic inputs aren't supported, as that is a very significant use case of modern model architectures like the Transformer.
Thank you for your feedback and strong suggestion. I've relayed your feedback to the relevant team.
Although
tf.linalg.matmul
can now be quantized addressing this issue, it is not properly identified in compilation. There exists a dedicated xnode class for matmuls that citestf.linalg.matmul
as a function it translates (XModelNodeMatMul), however the compiler consistently fails to link it and instead sets it as an unknown node, as you can see from the error below (the final error is on line 7090 inside of the XModelNodeUnknown class). Dense/Linear layers still quantize and compile as expected.I have tried all iterations and implementations of matmul inside of TF2 and all of them either fail to quantize or fail to compile as shown above. I get a similar error when attempting to Inspect the model before quantziation:
I noticed that inside of the hugging face BERT transformer example they instead use QuantTFOpLambda, which I attempted to use but had difficulty implementing.
If anyone has any advice how to implement the simple matrix multiply that the quantizer and compiler can detect and handle, it would be greatly appreciated.