Test dataset of questions to score reasoning

dave1010 / tree-of-thought-prompting

Using Tree-of-Thought Prompting to boost ChatGPT's reasoning

MIT License

672 stars 62 forks source link

Test dataset of questions to score reasoning #4

Open sapph1re opened 1 year ago

sapph1re commented 1 year ago

This indeed greatly improves prompting, although one question may be not very representative for the whole approach. To measure suggested solutions properly, shall we create a test dataset of questions to evaluate the results that we get from each prompt?

dave1010 commented 1 year ago

A test dataset would be a great idea.

There are many frameworks for testing LLMs available now, such as https://github.com/openai/human-eval