IntelLabs / t2sp

Productive and portable performance programming across spatial architectures (FPGAs, etc.) and vector architectures (GPUs, etc.)
Other
29 stars 12 forks source link

Intermediate results from systolic array is not vectorized #13

Open z24tao opened 2 years ago

z24tao commented 2 years ago

gemm: C = alpha A @ B + beta C intermediate result of alpha * A @ B is currently passed to the next kernel as individual scalars, instead of being vectorized