This change provide an offline mode for tunableOp:
It is easy to have OOM since APP usually needs large video memory size when running a LLM for inference. When the GEMM size is also very large, the APP will crash due to OOM.
For this case, we need a offline mode to tune the GEMMs. This is the first PR which record untuned GEMMs to file.
The API named tune_gemm_in_file is added to read the untuned file and tune the GEMMs in file
This change provide an offline mode for tunableOp: It is easy to have OOM since APP usually needs large video memory size when running a LLM for inference. When the GEMM size is also very large, the APP will crash due to OOM.
For this case, we need a offline mode to tune the GEMMs. This is the first PR which record untuned GEMMs to file.
The API named tune_gemm_in_file is added to read the untuned file and tune the GEMMs in file