A central, open resource for data and tools related to chain-of-thought reasoning in large language models. Developed @ Samwald research group: https://samwald.info/
Loading kojima data for commonsense_qa and strategy_qa needs two different answer extractions.
Outcome:
Evaluation of strategy_qa train cahnged
from the wrong:
{'accuracy': {'None_kojima-01_kojima-A-E': ...}
to the right:
Evaluating strategy_qa train...
{'accuracy': {'None_kojima-01_kojima-yes-no': ...}
Loading kojima data for commonsense_qa and strategy_qa needs two different answer extractions.
Outcome:
Evaluation of strategy_qa train cahnged from the wrong: {'accuracy': {'None_kojima-01_kojima-A-E': ...} to the right: Evaluating strategy_qa train... {'accuracy': {'None_kojima-01_kojima-yes-no': ...}