Open TrevorAshby opened 9 months ago
Model | Parameters | link |
---|---|---|
llama2 | 3b | https://huggingface.co/openlm-research/open_llama_3b_v2 |
Falcon | 7b | https://huggingface.co/tiiuae/falcon-7b |
MPT | xx | xx |
Flan-T5 | 3b | https://huggingface.co/google/flan-t5-xl |
Vicuna | xx | xx |
Model | Parameters | link |
---|---|---|
Star Coder | xx | xx |
StarChart-ß | xx | xx |
Salesforce CodeGen | xx | xx |
Some other models available at HuggingFace: https://huggingface.co/blog/os-llms#licensing
It would be interesting to see how our RLHF model compares to other code generation models such as StarCoder, Salesforce CodeGen, and StarChart- $\beta$. However, we would need to keep in mind that StarCoder and Salesforce CodeGen are autoregressive models (although they note we can turn them into technical assistants using a Tech Assistant Prompt), whereas StarChart- $\beta$ is the only instruction model.
Select a series of models to be used in the project. They will be fine-tuned, architecturally manipulated (i.e., replacing the last layer for reward model), and RLHF will be performed on all models.