AetherCortex / Llama-X

Open Academic Research on Improving LLaMA to SOTA LLM
Apache License 2.0
1.59k stars 101 forks source link

Llama-X

Code License Data License

Llama-X: Open Academic Research on Improving LLaMA to SOTA LLM

This is the repo for the Llama-X, which aims to:

The project will follow these principles:

📣 Please join Join us on Discord if you are interested in Llama-X.

Contents

  1. News

  2. Ten main research areas

  3. Llama-X Model Version

  4. Llama-X Evaluation

  5. Llama-X Paper List

  6. Usage

  7. How to contribute

News

We have completed the training of our first version of model (Llama-X 3.0.1 7B). Please experience our model in the demo page, and the data, code and model weights of different scales will be updated in this repo later.

Ten main research areas

[1]. Research on Instruction Tuning

[2]. Research on RLHF & RLAIF

[3]. Research on Data Quality

[4]. Research on Long Context Transformer

[5]. Research on Multi-modal (text + image) Modeling

[6]. Research on Multilingual

[7]. Research on Efficient infrastructure and optimization

[8]. Research on Evaluation

[9]. Research on Interpretability

[10]. Research on LLM on Actions

Llama-X Model Version

Llama-X Baseline Performance
3.0.0 (LLaMA) GPT-3 Outperform
3.1.0 text-davinci-001 Comparable
3.2.0 text-davinci-002 Comparable
3.3.0 text-davinci-003 Comparable
3.5.0 gpt-35-turbo Comparable
3.6.0 GPT-4 80% Avg.Gap
3.7.0 GPT-4 60% Avg.Gap
3.8.0 GPT-4 40% Avg.Gap
3.9.0 GPT-4 20% Avg.Gap
4.0.0 GPT-4 Comparable

We are focusing on the above research areas [1] & [3] now, and would public our first version of model (Llama-X 3.0.1) and paper.

Llama-X Evaluation

Each new version of Llama-X model should significantly outperform (+>1%) the current version model on the automatic evaluation of all the following Type-A benchmarks. And the additional evaluation for Type-B benchmarks should be added in the 3.6.0+ versions:

Type Benchmarks
A MMLU
A HumanEval
A GSM-8K
A NaturalQuestions
A TruthfulQA
B Leetcode
B GRE
B AP
B MMLU-Multilingual
B Visual Inputs (TBD)

Results:

Model MMLU TruthfulQA GSM-8K NaturalQuestions
InstructGPT davinci v2 (175B)^ 0.57 0.62 0.35 0.389
Llama-X 3.0.1 (7B) 0.4412 0.2032 0.1887 0.2422
Llama-i (7B) 0.5121 0.2142 0.2259 0.3499

^ The results of InstructGPT davinci v2 (175B) are copied from Stanford CRFM Benchmark.

Llama-X Paper List

  1. LLaMA: Open and Efficient Foundation Language Models.

Usage

LLaMA Batch Size V100s Time (h)
7 B 64 8 1.00
13 B 32 8 2.00

batch inference

To Do



<h2 id="contribute">How to contribute</h2>

Developers can become Contributors by contributing helpful code, data, paper and computing resource, etc.

1. Code: Including algorithm implementation, training optimization, inference optimization, and model deployment.

2. Data: Every [research area](#research-areas) and [version iteration](#model) requires high-quality data, including instruction-answer, pre-training, multi-modal, multilingual, and user feedbacks data, etc.

3. Paper: We will maintain a [Llama-X Paper List](#paper), and use Llama-X as the base model for optimized, fully tested, and significantly improved academic papers. You can check in to the Llama X Paper List.

4. Computing resource: We hope to help accelerate model iteration speed by coordinating redundant computing power from some developers or non-profit sponsorship from universities/enterprises.

<h2 id="communication">How to communicate with us</h2>

1. Github Issues

2. Email: llama-x@mail.com

3. Discord: <a href="https://discord.gg/2etwhe6GvU"><img alt="Join us on Discord" src="https://img.shields.io/discord/823813159592001537?color=5865F2&logo=discord&logoColor=white"></a>

## Thanks For

This project has been inspired by multiple open source projects:

[Meta AI LLaMA](https://arxiv.org/abs/2302.13971v1)

[Huggingface Transformers Llama](https://github.com/huggingface/transformers/tree/main/src/transformers/models/llama)

[Alpaca](https://crfm.stanford.edu/2023/03/13/alpaca.html) and [Alpaca-LoRA](https://github.com/tloen/alpaca-lora)

## Disclaimer

The use of resources(e.g., code, data and model weights) related to this project is limited to academic research and is prohibited for commercial purposes. The content generated by any model of Llama-X is subject to factors such as randomness and uncontrollability, and this project cannot guarantee its accuracy. This project does not assume any legal responsibility for the content of the model output, nor does it assume any responsibility for any losses that may arise from the use of related resources and output results.