danielgross / LlamaAcademy

A school for camelids
MIT License
1.21k stars 77 forks source link

LlamaAcademy: Teaching Llamas How to Code

Teach GPTs to read API documentation using LLaMA, LoRA, and Langchain.

Wouldn't it be great if GPTs could learn about new APIs? With LlamaAcademy you can teach GPTs to call Stripe, Notion, or even your own product's API. Instead of hosting API documentation, you can host an API implementation! Just point LlamaAcademy at your API docs, run the script, and -- shazam! -- a new LLaMA model will be created for you. You can host that model on your server, and users can call your bespoke mini-GPT to write their API glue.

Seriously?

Well, sort of. LlamaAcademy is experimental -- we haven't gotten it to consistently generate great code (yet). We'd love help with that, if you're into that sort of thing.

Demo: A Llama That Learned Notion's API

https://user-images.githubusercontent.com/51882888/232329429-c7aadc40-8251-41f3-b4bb-9ac41ac2c6f8.mp4

How it works

LlamaAcademy is a pipeline that combines the following steps: crawling, data generation using GPT3.5 and GPT4 and fine-tuning Vicuna-13B on synthetic data. LlamaAcademy Data Generation

Installation

You need to install firefox and Elinks, then install all necessary pythonic dependencies. You also need to input an OPENAI_KEY.

sudo apt-get install firefox elinks
conda env create --file=environment.yaml
conda env config vars set OPENAI_API_KEY=YOUR_API_KEY

Usage

LlamaAcademy uses a simple interface by abstracting every user hyper-parameters with a configuration file.

GENERATE: True # Turn off if you don't want to generate the data
API_DOCS: https://developers.notion.com/reference 
DEPTH_CRAWLING: 1 # 0 if your API website is long and not hierarchical (for example polygon.io). Otherwise, feel free to set, it might take much longer if your webiste has many children.
SUMMARIZE_DOCS: True
MICRO_BATCH_SIZE: 3  
BATCH_SIZE: 12
EPOCHS: 4  
LEARNING_RATE: 3e-4  
WARMUP_STEPS: 5
CUTOFF_LEN: 2048 
LORA_R: 8
LORA_ALPHA: 16
LORA_DROPOUT: 0.05
OPENAI_ENGINE: "gpt-4"
NUM_PROMPT_INSTRUCTIONS: 3
NUM_TASKS_TO_GENERATE: 200 # Recommended number of examples
DATA_PATH: "assets/"
OUTPUT_DIR: "output/lora-vicuna-api-notion"

To run the fine-tuning process, run:

CUDA_VISIBLE_DEVICES=0 python3 main.py --config configs/vicuna_13b.yaml

After the training, run export LoRA model to HuggingFace weights by doing this:

python3 export_hf.py --base_model jeffwan/vicuna-13b --model_folder output/lora-vicuna-api-notion

To run inference with LangChain:

python3 inference.py --model_folder output/lora-vicuna-api-notion

Hardware requirements

This code is tested with 1 RTX A6000 instance in vast.ai (approximated 0.6$/1h). The peak VRAM is 27.8 GB, therefore, any GPU with VRAM > 30GB will be safe for fine-tuning. The fine-tuning is done after 20 minutes with 100 examples, the data generation is completed after 1 hour (most of the time spent in GPT-4 instances generation and crawling process due to screen scraping being quite expensive).

Plan

Code Files

This repository provides the following Folders and Files