Open Zhiy-Zhang opened 4 months ago
serving整体的profiling信息根据宏“PPL_LLM_ENABLE_PROFILING”输出,默认是打开的,算子的profiling信息需要nsight去看,建议跑offline_inference。如果想要单step的kernel profling信息,可以参考https://github.com/openppl-public/ppl.nn/blob/master/tools/pplnn_llm.cc#L819 ,编译时“-DPPLNN_ENABLE_KERNEL_PROFILING=ON”
What are the problems?(screenshots or detailed error messages)
想问下有性能分析的工具嘛?profiler相关,还是只能用nsight profile这种自己去看一些算子性能
What are the types of GPU/CPU you are using?
GPU:A100-80G-SXM4
What's the operating system ppl.llm.serving runs on?
Ubuntu 20.04.4 cuda:12.3 cudnn:8904 trt:9.2.0
What's the compiler and its version?
gcc 11.4 cmake version 3.27.9 Cuda compilation tools, release 12.3, V12.3.107
Which version(commit id or tag) of ppl.llm.serving is used?
commit id:51c3b3d5c5eba25c276a84388f04a2c9e198699f