finetuning-rl Search Results

134 results
for finetuning-rl

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

huggingface/peft #1522

Different versions seem to have an impact on the results

Hi! I tried to do lora with llama2 model. However, when I use peft==0.9.0, loss is always NAN.0, when I use peft==0.3.0, loss is normal. I'm curious if there are significant differences in Lora betwee…

passby111 updated 5 months ago
24
flowersteam/lamorel #14

Can't start PPO_finetuning example with 1 machine and 1 GPU

Hello! Have a problem with starting PPO_finetuning example with only 1 machine and 1 GPU. But succesfully started examples in https://github.com/flowersteam/Grounding_LLMs_with_online_RL with lamorel…

tokarev-i-v updated 1 year ago
1
flowersteam/lamorel #17

How to load trained Flan-T5 model and then fine tune

Hi, I wonder about how to load a trained T5 policy? And then fine-tune it on unseen tasks. I find the parameters of llm_module in PPOUpdater but I can't find how to load_state_dict() to _ llm_module. …

yanxue7 updated 1 year ago
6
QwenLM/Qwen #1178

[BUG] 对qwen-7b模型微调后，输出句子断句不正常，直接从句子中间停止

### 是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this? - [X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions ### 该问题是否在FAQ中有解答？ | Is there an existing ans…

twwch updated 7 months ago
5
microsoft/DeepSpeedExamples #704

[BUG]deepspeed-chat training error on v100 * 8, raise assert…

**Describe the bug** Hi, everybody, I'm traning a llama model in step3 using deepspeed-chat. In version 0.10.1, it raised the following error([see in logs bleow](https://github.com/microsoft/DeepSp…

iamsile updated 12 months ago
3
fly51fly/aicoco #5

爱可可老师一周论文精选

fly51fly updated 5 months ago
106
fly51fly/aicoco #3

爱可可老师24小时热门分享

微博内容精选

fly51fly updated 2 weeks ago
1906
huggingface/diffusers #3706

New Discrete Diffusion Model: Wuerstchen

### Model/Pipeline/Scheduler description The amazing @dome272 published a new diffusion model, called Wuerstchen: https://github.com/dome272/Wuerstchen I think it would be a very nice addition t…

patrickvonplaten updated 11 months ago
18
huggingface/blog #1292

Errata on "Illustrating Reinforcement Learning from Human Fe…

https://huggingface.co/blog/rlhf ### Background In the section on the third step of the process, it is written: - What multiple organizations seem to have gotten to work is **fine-tuning some…

Voyz updated 1 year ago
4
huggingface/trl #641

Struggling to Reproduce StackLLaMA

Hello, I am currently trying to reproduce the evaluation results in the [original StackLLaMA blog post](https://huggingface.co/blog/stackllama) after fine-tuning LLaMA with PPO, but am struggling to d…

rajpabari updated 1 year ago
9

上一页 1...5 6 7 8 9 10 11...14 下一页

134 results for finetuning-rl

134 results
for finetuning-rl