Closed ZhiqiJiang closed 3 months ago
I have found the answer in https://github.com/NVIDIA/TensorRT-LLM/blob/main/cpp/tensorrt_llm/layers/topKSamplingLayer.cu:55.
if (k > 0 && p == 0.0f)
{
// This case corresponds to the old topk sampling, which is equivalent to
// the old topk_topp sampling with topp=1.0f. TopKSamplingLayer and
// TopKTopPSamplingLayer are now merged by TopKSamplingLayer. Thus, we
// replace the case topk>0 and topp=0.0f by topk>0 and topp=1.0f for the
// compatibility.
p = 1.0f;
}
https://github.com/NVIDIA/TensorRT-LLM/blob/main/cpp/tensorrt_llm/layers/topPSamplingLayer.cu:60
skipDecode[batchSlot] = k > 0;
As shown in the code, top_p sampling will be skipped when top_k > 0. Why not top_p sampling follow top_k sampling?