finetuning-rl Search Results

flowersteam/Grounding_LLMs_with_online_RL #27

Can't run using lamorel

![image](https://github.com/user-attachments/assets/4a219c5f-226d-49be-973e-86fc384d0bf6) ![image](https://github.com/user-attachments/assets/7fdab145-e875-4a46-8164-67a460d63b9a) I tried running yo…

IAMHAADICOOL updated 3 months ago

isaac-sim/IsaacLab #658

[Bug Report] Wandb sweep agent cuts in the thread of Isaac S…

### Describe the bug Wandb sweep agent cuts in the thread of Isaac Sim before calling app.update(), which causes the process to hang forever. ### Steps to reproduce ```python # Copyright (c)…

breadli428 updated 2 weeks ago

flowersteam/lamorel #23

Connection error

Hello! I tried an experiment using the llama2 13b model and got a CONNECTION ERROR. **RL script** > python -m lamorel_launcher.launch --config-path /home/xxx/Grounding_LLMs_with_online_RL/lamorel…

yone456 updated 11 months ago

long8v/PTIR #188

[169] Direct Preference Optimization: Your Language Model is…

[paper](https://arxiv.org/abs/2305.18290) ## TL;DR - **I read this because.. :** 배경지식 차 - **task :** RL - **problem :** TRPO도 별도의 Reward model을 학습해야 하는데 모델이 커짐에 따라 너무 힘듦 - **idea :** rewa…

long8v updated 2 months ago

Tencent/PocketFlow #258

Channel Pruning should reCreatePruner in channel pruning

在DDPG训练完也就是__prune_rl()后，应该再加一个self.create_pruner()吧，如果不加这个，感觉是在RL最后一次的compress上应用新的pruning，这应该不是正解吧！！！感觉还是重新create_pruner()比较好一点。各位大佬看看是不是这样子？ ![屏幕快照 2019-03-21 下午8 56 15](https://user-images.gith…

Nankaiming updated 5 years ago

yangzhipeng1108/DeepSpeed-Chat-ChatGLM #9

AutoModelForCausalLM

AutoModelForCausalLM 中class没有chatglm你是如何解决的呢

Altrouge7 updated 11 months ago

opening-up-chatgpt/opening-up-chatgpt.github.io #88

Improve YAML format by including assessment date & model ver…

With the proliferation of models and model variants it becomes more important to track assessment dates and model versions. So far we've been able to treat model families as one, because it rarely …

mdingemanse updated 6 months ago

octo-models/octo #43

Issue with diffusion head

I was able to fine-tune with a modified version of example 2 with the following action head: ``` config["model"]["heads"]["action"] = ModuleSpec.create( L1ActionHead, pred_horizon=9, …

seann999 updated 3 days ago

salesforce/CodeRL #30

Problems in reproducing the RL fine-tuned results

Hi, thanks for open-sourcing your amazing work! I have been trying to reproduce the RL fine-tuned results reported in the paper, but unfortunately, I am encountering some issues. Here is a brief o…

abhik1505040 updated 1 year ago

Jbwasse2/XGX #3

Training Issues and Training Resources

Thank you for your great work. I'm interested in reproducing your results. ```bash python run.py --exp-config ./configs/experiments/XGX.yaml --run-type train ``` However, I encountered an issue …

yusirhhh updated 7 months ago

134 results for finetuning-rl

134 results
for finetuning-rl