apache / tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators
https://tvm.apache.org/
Apache License 2.0
11.67k stars 3.45k forks source link

[CUBLAS][FP8] Enable fusing astype operation for matmul multiply pattern #17006

Open valmat07 opened 4 months ago

valmat07 commented 4 months ago

This PR adds fusing of the astype operation to matmul for cublas. This change is needed to improve the performance for fp8.

vinx13 commented 4 months ago

do we need to update cublas codegen or runtime to support the cast?