Open zggg1p opened 8 months ago
What is the difference between top_k=1 and greedy decoding, and why should we experiment separately?
What is the difference between top_k=1 and greedy decoding, and why should we experiment separately?