Since there is some randomness in the sampling, the model can sometimes sample completely nonsense tokens, particularly when the temperature is higher. A method called nucleus sampling constrains the model to only sample tokens having a cumulative mass up to top-p.
Here's an example:
temperature=2 and max-tokens=100:
<user>: Tell me a story.
<assistant>: A long time ago, there was a magical village situated in
the enchanted forest called Greenhaven. This little town of passionate
zoologists inspired wonder amongst the five sculpted fox installations
-- Florence, Leonard Squabs-shaft Mirfox Caligon-on Hor’dory-am
Neyfabam Possfatipy Musitight Funoy Aradilio Blossbar Busheadid Boubio
Lynnomo Epurdy-be Tangorst Dencre-che Homdo, Justbil Hiageoor Pieplex,
Lotathon Ludiphos
I constrained this to 100 maximum tokens because it otherwise continues on producing garbage ad-nauseam.
temperature=2 max-tokens=100 top-p=0.95:
<user>: Tell me a story.
<assistant>: Once upon a time, there was a small town called Riverdale
nestled deep in the heart of a forest. The people of Riverdale were
simple and lived peaceful lives until one day a dark shadow descended
upon the town. It started with small things, people started
disappearing, and then the livestock began to be killed one by one.
Soon the town was living in fear, afraid of who or what was behind the
dark shadow that had descended upon them. Every day, more people
disappeared, and the towns
Since there is some randomness in the sampling, the model can sometimes sample completely nonsense tokens, particularly when the temperature is higher. A method called nucleus sampling constrains the model to only sample tokens having a cumulative mass up to
top-p
.Here's an example:
temperature=2 and max-tokens=100:
I constrained this to 100 maximum tokens because it otherwise continues on producing garbage ad-nauseam.
temperature=2 max-tokens=100 top-p=0.95: