fine-tuning-rl Search Results

YanSte/NLP-LLM-Fine-tuning-Llame-2-QLoRA-2024 #1

More RL fine-tuning examples for LLMs (PPO & DPO)

Great repo! can you add example notebooks on using PPO and DPO for RL fine-tuning of LLMs with SFTs. Thanks

JhonDan1999 updated 5 months ago

facebookresearch/off-belief-learning #16

How to get Belief Fine-Tuning working with RL Search?

Hi Hengyuan, I'm wanting to get my agent to apply rl_search and belief fine-tuning at test time. I've successfully got rl_search working by running [this python file](https://github.com/facebookres…

ravihammond updated 1 year ago

olsson-group/RL-GraphINVENT #3

Error while running submitPT and submitFT FileNotFoundErro…

Hi, Error when running submitPT.py (RL-GraphINVENT-env) user@Pramod:~$ python submitPT.py * Creating dataset directory output_chembl25_500k/train/ * Creating model subdirectory output_chembl25…

pramodkp updated 6 months ago

wengong-jin/multiobj-rationale #5

In which part does it incorporate RL?

It's nice work! However I have a question. Since I'm not so familiar with Reinforce Learning, I wonder which part of it has RL? In 3.3.2 fine-tuning, "Update the model P(G,S) on the fine-tuning set $D…

YifanDengWHU updated 2 years ago

huggingface/huggingface-llama-recipes #43

Call for contributions

# 🎉 Open Call for Contributions to the LLaMA Recipes Repository Hey there! 👋 We are excited to open up our repository for open-source contributions and can't wait to see what recipes you come up…

ariG23498 updated 1 week ago

facebookresearch/habitat-lab #1046

Fine-tune PACMAN with RL for EQA

Hello, is there a code for fine-tuning navigation (PACMAN) for EQA with RL? Are there plans to implement this feature? For now, I can only see scripts for training PACMAN with imitation learning... …

nilinykh updated 1 year ago

irthomasthomas/undecidability #731

LlamaGym: Online Reinforcement Learning for LLM-based agents…

- [ ] [LlamaGym/README.md at main · KhoomeiK/LlamaGym](https://github.com/KhoomeiK/LlamaGym/blob/main/README.md?plain=1) # LlamaGym/README.md at main · KhoomeiK/LlamaGym DESCRIPTION: Fine-tune LL…

irthomasthomas updated 7 months ago

domenicrosati/training-time-domain-authorization #4

Domain-Specific Adversarial Attacks [Harsh/David/Jan]

## Issue The task is to formulate non-SFT attacks in order to test the robustness of a defence solution. The attack will need to run with an arbitrary domain like medical advice. Types of Attac…

domenicrosati updated 3 months ago

jasonvanf/llama-trl #5

tuning_lm_with_rl.py does not appear to have a file named c…

Hi Jason, I followed the steps Step 1 - Supervised Fine-tuning, generate "/checkpoints/supervised_llama/" including folders: ``` checkpoint-2000 checkpoint-3000 checkpoint-4000 final_checkp…

judyhappy updated 1 year ago

qianlin04/Safe-offline-RL-with-diffusion-model #1

Clarification on Hyperparameters

Hi, Thanks for the wonderful work! I have a question regarding the hyperparameters in the paper. Are the default hyperparameters stored in config.locomotion the same as those used in Figures 2, 5,…

greg3566 updated 2 months ago

355 results for fine-tuning-rl

355 results
for fine-tuning-rl