OpenBioLink / ThoughtSource

A central, open resource for data and tools related to chain-of-thought reasoning in large language models. Developed @ Samwald research group: https://samwald.info/
MIT License
863 stars 69 forks source link
dataset machine-learning natural-language-processing question-answering reasoning

ThoughtSource⚡

A framework for the science of machine thinking

DatasetsTutorial notebookInstallation guideDataset Annotator

ThoughtSource is a central, open resource and community centered on data and tools for chain-of-thought reasoning in large language models (Wei 2022). Our long-term goal is to enable trustworthy and robust reasoning in advanced AI systems for driving scientific research and medical practice.

ThoughtSource overview 3

📄 Pre-print: Ott et al. "ThoughtSource: A central hub for large language model reasoning data", arXiv, 2023

📄 Pre-print: Hebenstreit et al. "An automatically discovered chain-of-thought prompt generalizes to novel models and datasets", arXiv, 2023

Workflow

ThoughtSource overview 1 ThoughtSource overview 2

Available datasets

Our dataloaders allow you to access the following datasets in a standardized chain-of-thought format. The dataloaders create objects in the Hugging Face 🤗 Datasets format. We (sometimes extensively) post-processed the source datasets in different ways to create more coherent reasoning chains.


Datasets can be browsed online through the Dataset Viewer 🔎


General question answering

Scientific / medical question answering

Math word problems

Collections of datasets

For quick and economic formative evaluation of CoT reasoning, we combined random examples of the above datasets to collections.

We are working on collecting and generating additional datasets, and on further improving the quality of existing datasets (see dataset issues). We welcome suggestions for the inclusion of other datasets.

We welcome dataset contributions! 👉 Have a look at our contribution guide!

Annotator

Demonstration of the annotator tool The annotator allows for highlighting similarities between different generated reasoning chains, making it easier to spot strenghts and weaknesses and to select best results.


Use the web-based annotator 📝
To try out the annotator, simply type in your name and load this example file



Installation and code structure

Installation

execute in terminal line by line:

git clone git@github.com:OpenBioLink/ThoughtSource.git
cd ThoughtSource
# install pip and virtualenv
sudo apt install python3-pip
sudo apt install python3-venv
# create and activate virtual environment
python3 -m venv venv
source ./venv/bin/activate
# install requirements and API packages
pip install -e ./libs/cot[api]

Applications

Libraries

# 1) Dataset loading and selecting a random sample
collection = Collection(["worldtree"], verbose=False)
collection = collection.select(split="train", number_samples=10)

# 2) Language Model generates chains of thought and then extracts answers
config={
    "instruction_keys": ['qa-01'], # "Answer the following question through step-by-step reasoning."
    "cot_trigger_keys": ['kojima-01'], # "Answer: Let's think step by step."
    "answer_extraction_keys": ['kojima-A-D'], # "Therefore, among A through D, the answer is"
    "api_service": "huggingface_hub",
    "engine": "google/flan-t5-xl",
    "warn": False,
    "verbose": False,
}
collection.generate(config=config)

# 3) Performance evaluation
collection.evaluate()
{'accuracy': {'qa-01_kojima-01_kojima-A-D': 0.6}}

👉 See the tutorial notebook for more code examples.


Citation

@misc{https://doi.org/10.48550/arxiv.2301.11596,
  doi = {10.48550/ARXIV.2301.11596},
  url = {https://arxiv.org/abs/2301.11596},
  author = {Ott, Simon and Hebenstreit, Konstantin and Liévin, Valentin and Hother, Christoffer Egeberg and Moradi, Milad and Mayrhauser, Maximilian and Praas, Robert and Winther, Ole and Samwald, Matthias},
  keywords = {Computation and Language (cs.CL), Artificial Intelligence (cs.AI), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {ThoughtSource: A central hub for large language model reasoning data},
  publisher = {arXiv},
  year = {2023}, 
  copyright = {Creative Commons Attribution 4.0 International}
}

Versioning

All updates/changes to datasets are explicitly mentioned in bold.

1.0.0 (2023-07-11) - Released ThoughtSource_33 collection with 60 reasoning chains for each item: `Collection.load_thoughtsource_33()` - Added an option for creating chained commands - Added chat option of gpt - Added filtering functions for already created chain-of-thoughts - Added new datasets: **MMLU** (six medical subsets) and open-ended question version of **MedQA**
0.0.5 (2023-03-10) - Added a function to select which generated CoTs to keep after loading: `collection.select_generated_cots(author="thoughtsource")`
0.0.4 (2023-03-08) - Improved evaluation function - Added a function to load ThoughtSource100 collection: `Collection.load_thoughtsource_100()`
0.0.3 (2023-02-24) - Released ThoughtSource_100 collection with reasoning chains from GPT-text-davinci-003, flan-t5-xxl, and cohere's command-xl
0.0.2 (2023-02-15) - Updated annotator tool for correct data schema (this might result in errors loading old datasets, when loading from json files) - **Pubmed_qa**: Included "LONG_ANSWER" from origin schema as "cot" in ThoughtSource schema
0.0.1 (2023-02-01) - Initial release after Twitter announcement of project