Tencent / TurboTransformers

a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.
Other
1.49k stars 198 forks source link

How to test ' spent on GEMM kernels ' ? what tool used ? #178

Closed hadoop2xu closed 4 years ago

hadoop2xu commented 4 years ago

image 请问下大佬 这个占比是用什么工具分析出来的 ?

feifeibear commented 4 years ago

nvvp工具看time line得到

feifeibear commented 4 years ago

代码里也有手工计时函数,可以打印一些丑陋的日志。