ROCm / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
http://pytorch.org
Other
219 stars 50 forks source link

Tunableop improvements: record untuned gemm and provide a API to tune them offline #1431

Closed jfactory07 closed 1 month ago

jfactory07 commented 1 month ago

This change provide an offline mode for tunableOp: It is easy to have OOM since APP usually needs large video memory size when running a LLM for inference. When the GEMM size is also very large, the APP will crash due to OOM.

For this case, we need a offline mode to tune the GEMMs. This is the first PR which record untuned GEMMs to file.

The API named tune_gemm_in_file is added to read the untuned file and tune the GEMMs in file