beeevita / EvoPrompt

Official implementation of the paper Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers
75 stars 12 forks source link

🧬 EvoPrompt

This is the official implementation of the paper Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers

πŸ“ƒ Abstract

Large Language Models (LLMs) excel in various tasks, but they rely on carefully crafted prompts that often demand substantial human effort. To automate this process, in this paper, we propose a novel framework for discrete prompt optimization, called EvoPrompt, which borrows the idea of evolutionary algorithms (EAs) as they exhibit good performance and fast convergence. To enable EAs to work on discrete prompts, which are natural language expressions that need to be coherent and human-readable, we connect LLMs with EAs. This approach allows us to simultaneously leverage the powerful language processing capabilities of LLMs and the efficient optimization performance of EAs. Specifically, abstaining from any gradients or parameters, EvoPrompt starts from a population of prompts and iteratively generates new prompts with LLMs based on the evolutionary operators, improving the population based on the development set. We optimize prompts for both closed- and open-source LLMs including GPT-3.5 and Alpaca, on 31 datasets covering language understanding, generation tasks, as well as BIG-Bench Hard (BBH) tasks. EvoPrompt significantly outperforms human-engineered prompts and existing methods for automatic prompt generation (e.g., up to 25% on BBH). Furthermore, EvoPrompt demonstrates that connecting LLMs with EAs creates synergies, which could inspire further research on the combination of LLMs and conventional algorithms.

πŸš€ Quick Start

βš™οΈ Preparation

  1. Environmental settings: pip install -r requirements.txt
  2. Data download: The test data for the language understanding task can be found here. Put the test file in the folder ./data/cls/{dataset_name}. For datasets of BBH, download from the repo CoT-hub and put them in the folder BBH/data/{dataset_name}.
  3. OpenAI API key required: add your OpenAI API key and other related settings in the file auth.yaml

β™» Evolution

We instanciate two evolutionary algorithms, GA (genetic algorithm) and DE (diffenrential evolution) to evolve upon the initial population. Evolve your prompts using the following commands:

Customize the parameter --llm_type to use text-davinci-003, gpt-3.5-turbo, gpt-4.

# understanding task on Alpaca
bash scripts/cls/run_ga_alpaca.sh  # Genetic algorithm
bash scripts/cls/run_de_alpaca.sh  # Differential evolution

# simplification task on Alpaca
bash scripts/sim/run_de_alpaca.sh
bash scripts/sim/run_ga_alpaca.sh

# summarization task on Alpaca
bash scripts/sum/run_de_alpaca.sh
bash scripts/sum/run_ga_alpaca.sh

# for BBH tasks
cd BBH
bash scripts/run_de_cot.sh  # DE 
bash scripts/run_ga_cot.sh  # GA

πŸ€” Inference

To evaluate a single instruction, run the following, set the argument --content to evaluate a performance of a specific prompt

bash scripts/cls/eval_single_alpaca.sh  # understanding task on alpaca
bash scripts/sim/eval_single_alpaca.sh  # simplification
bash scripts/sum/eval_single_alpaca.sh  # summarization

# BBH
cd BBH
bash scripts/eval.sh  # few-shot evaluation

πŸ“Œ Notes

Note that we have two language models used in our framework, one is for evolution (argument --llm_type), the other for the task implementation (--language_model).

πŸ’‘Tips for Usage

The number of iteration and the population size effect the performance of EvoPrompt. There exists a trade-off between the cost and the performance. For relative simple tasks, a size of 10 and 10 iterative steps are enough, or even less. While for complex tasks, a larger population with diversity is required.

πŸ”’ Arguments

You may need to set the following arguments to customize your own configuration.

πŸ“Ž Framework

For the pipeline of EvoPrompt, there are mainly three steps as follows, while for each of them algorthms, there exists slight differences to instantiate.

🧬 Genetic Algorithm

🧬 Differential Evolution

🌳 Code Strucutre

.
β”œβ”€β”€ args.py
β”œβ”€β”€ auth.yaml
β”œβ”€β”€ BBH  # code for BBH tasks
β”œβ”€β”€ data  # dataset, templates used
β”‚   β”œβ”€β”€ cls
β”‚   β”œβ”€β”€ sim
β”‚   β”œβ”€β”€ sum
β”‚   β”œβ”€β”€ template_de.py  # templates of prompt evolution by DE
β”‚   β”œβ”€β”€ template_ga.py  # templates of prompt evolution by GA
β”‚   β”œβ”€β”€ template_v2.json  # templates for task implementation
β”‚   └── templates.py  # wrapper
β”œβ”€β”€ dataset.py  # dataset class
β”œβ”€β”€ evaluator.py  # evaluators on different tasks
β”œβ”€β”€ evoluter.py  # DE, GA, APE
β”œβ”€β”€ evolution.py  # DE, GA, APE
β”œβ”€β”€ get_result.py
β”œβ”€β”€ infer.py  # main file for inference
β”œβ”€β”€ llm_client.py  # LLM query
β”œβ”€β”€ metrics.py  # metric calculation
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ run.py  # main file for evolution
β”œβ”€β”€ scripts  # scripts to run the code
└── utils.py  # auxiliary functions

🧩 Possible Extension

β˜•οΈ Citation

If you find this repository helpful, please consider citing our paper:

@article{guo2023connecting,
  title={Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers},
  author={Guo, Qingyan and Wang, Rui and Guo, Junliang and Li, Bei and Song, Kaitao and Tan, Xu and Liu, Guoqing and Bian, Jiang and Yang, Yujiu},
  journal={arXiv preprint arXiv:2309.08532},
  year={2023}
}

Acknowledgements

Our codebase is based on the following repos. Thanks for open-sourcing!

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.