Open valmat07 opened 4 months ago
This PR adds fusing of the astype operation to matmul for cublas. This change is needed to improve the performance for fp8.
do we need to update cublas codegen or runtime to support the cast?
This PR adds fusing of the astype operation to matmul for cublas. This change is needed to improve the performance for fp8.