-
Great repo!
can you add example notebooks on using PPO and DPO for RL fine-tuning of LLMs with SFTs.
Thanks
-
Hi Hengyuan,
I'm wanting to get my agent to apply rl_search and belief fine-tuning at test time. I've successfully got rl_search working by running [this python file](https://github.com/facebookres…
-
Hi,
Error when running submitPT.py
(RL-GraphINVENT-env) user@Pramod:~$ python submitPT.py
* Creating dataset directory output_chembl25_500k/train/
* Creating model subdirectory output_chembl25…
-
It's nice work! However I have a question. Since I'm not so familiar with Reinforce Learning, I wonder which part of it has RL? In 3.3.2 fine-tuning, "Update the model P(G,S) on the fine-tuning set $D…
-
# 🎉 Open Call for Contributions to the LLaMA Recipes Repository
Hey there! 👋
We are excited to open up our repository for open-source contributions and can't wait to see what recipes you come up…
-
Hello,
is there a code for fine-tuning navigation (PACMAN) for EQA with RL? Are there plans to implement this feature? For now, I can only see scripts for training PACMAN with imitation learning...
…
-
- [ ] [LlamaGym/README.md at main · KhoomeiK/LlamaGym](https://github.com/KhoomeiK/LlamaGym/blob/main/README.md?plain=1)
# LlamaGym/README.md at main · KhoomeiK/LlamaGym
DESCRIPTION:
Fine-tune LL…
-
## Issue
The task is to formulate non-SFT attacks in order to test the robustness of a defence solution.
The attack will need to run with an arbitrary domain like medical advice.
Types of Attac…
-
Hi Jason,
I followed the steps
Step 1 - Supervised Fine-tuning, generate "/checkpoints/supervised_llama/" including folders:
```
checkpoint-2000
checkpoint-3000
checkpoint-4000
final_checkp…
-
Hi,
Thanks for the wonderful work! I have a question regarding the hyperparameters in the paper. Are the default hyperparameters stored in config.locomotion the same as those used in Figures 2, 5,…