TrevorAshby / CodeRLHF

0 stars 0 forks source link

Model List #4

Open TrevorAshby opened 9 months ago

TrevorAshby commented 9 months ago

Select a series of models to be used in the project. They will be fine-tuned, architecturally manipulated (i.e., replacing the last layer for reward model), and RLHF will be performed on all models.

TrevorAshby commented 9 months ago

Project Models

Model Parameters link
llama2 3b https://huggingface.co/openlm-research/open_llama_3b_v2
Falcon 7b https://huggingface.co/tiiuae/falcon-7b
MPT xx xx
Flan-T5 3b https://huggingface.co/google/flan-t5-xl
Vicuna xx xx

Evaluation Models

Model Parameters link
Star Coder xx xx
StarChart-ß xx xx
Salesforce CodeGen xx xx
faustotnc commented 9 months ago

Some other models available at HuggingFace: https://huggingface.co/blog/os-llms#licensing

It would be interesting to see how our RLHF model compares to other code generation models such as StarCoder, Salesforce CodeGen, and StarChart- $\beta$. However, we would need to keep in mind that StarCoder and Salesforce CodeGen are autoregressive models (although they note we can turn them into technical assistants using a Tech Assistant Prompt), whereas StarChart- $\beta$ is the only instruction model.