Open markoarnauto opened 1 month ago
The idea is to test whether sampling strategies like majority-vote, tree-of-thought, graph-of-thought, r-star, self-discover can be improved with special deployments (llm which is good in reasoning + reward llm).
@eva-jagodic in case you think:
adjust inference quality by sampling
By sampling (generating multiple responses with varying temperature) and then filtering (e.g. pick majority answer) the quality of a model can be improved almost indefinitely. Even million of samples work. As many sampling strategies work in parallel, this would be a neat extension.
The use could simply set the number samples and therefore adjust the quality to his liking. Especially the 'no-limits' features makes dedicated inference well suited.
There are even more sophisticated sampling schemes like r-star from microsoft. But not all of them can be parallelized. Some of them are able to beat gpt zero (although I forgot the where I saw this). @eva-jagodic interesting, don't you think?