rlhf Search Results - Githubissues

1000+ results
for rlhf

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

thunlp/UltraChat #4

Support Fine-tune on PeFT adaptation

This project is great and the dataset is unique. To provide help to the community, it will be a great idea to support PeFT training on this dataset. Also, there's a chance to increase the training to …

Ejafa updated 1 year ago
1
microsoft/DeepSpeedExamples #593

In step 3, I met a error when executing self.actor_model.eva…

Here is the error I met, seems like the `self._total_batch_size` is `None`, but I don't know the reason ``` File "/path/model_training/DeepSpeed-Chat/training/step3_rlhf_finetuning/main.py", lin…

ZJXNEFU updated 6 months ago
6
openlm-research/open_llama #86

Any plans to do openLlama2?

Use Llama2 model and train on all latest and more efficient (like SlimPajama vs redpajama) open datasets ? Just for the base model, then maybe open-assistant team can rlhf it

djaym7 updated 6 months ago
3
microsoft/DeepSpeedExamples #814

Llama2 as actor using zero_stage3

Hello! Did anyone meet the following bug when using zero_stage3 for Lllama2? step3_rlhf_finetuning/rlhf_engine.py:61 in __init__ │ │ …

George-Chia updated 6 months ago
1
microsoft/DeepSpeedExamples #619

LoRA in Benchmark Settings

Do you use LoRA for the step3 RLHF benchmarks in https://github.com/microsoft/DeepSpeedExamples/blob/master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/BenckmarkSetting.md ? Or are you …

cat-state updated 11 months ago
1
l294265421/alpaca-rlhf #7

how to run it, need more details

and how to install alpaca-rlhf

SeekPoint updated 1 year ago
2
PaddlePaddle/PaddleNLP #6337

[Question]: 后面会开发ppo和reward模型的训练方法吗

### 请提出你的问题后面会开发ppo和reward模型的训练方法吗

liuzhipengchd updated 1 month ago
1
microsoft/DeepSpeedExamples #529

Much more memory used in step 3 when using multi gpus compar…

**System Info:** Memory: 500G GPU: 8 * A100 80G Question: **Why using multi gpus in init of DeepSpeedRLHFEngine used much more memroy compared to using single gpu ?** **Reproduce:** Copy mode…

cokuehuang updated 4 months ago
5
AkihikoWatanabe/paper_notes #1296

Training language models to follow instructions with human f…

# URL - https://arxiv.org/abs/2203.02155 # Affiliations - Long Ouyang, N/A - Jeff Wu, N/A - Xu Jiang, N/A - Diogo Almeida, N/A - Carroll L. Wainwright, N/A - Pamela Mishkin, N/A - Chong …

AkihikoWatanabe updated 2 months ago
1
HumanSignal/RLHF #6

How to fix the following errors?

The following error occurred while running cell 10 in **6. Tune language model using PPO with our preference model**. After adding `__init__.py` to `/content/trlx/examples/summarize_rlhf/reward_model…

missflash updated 1 month ago
1

上一页 1...6 7 8 9 10 11 12...100 下一页

1000+ results for rlhf

1000+ results
for rlhf