ROCm / composable_kernel

Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
https://rocm.docs.amd.com/projects/composable_kernel/en/latest/
Other
297 stars 113 forks source link

[CK_TILE] wa prec, remove sgpr offset for inline asm #1356

Closed carlushuang closed 2 months ago

carlushuang commented 3 months ago
  1. WA precision issue in rocm-6.2 (compute-artifactory.amd.com:5000/rocm-plus-docker/framework/compute-rocm-dkms-no-npi-hipclang:14119_ubuntu22.04_py3.10_pytorch_release-2.3_db8da3b)
  2. remove sgpr offset to speed up issue