rlhf Search Results - Githubissues

1000+ results
for rlhf

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

nebuly-ai/nebuly #288

for rlhf_accelerate branch, can't run with multiGPU

my ~/.cache/huggingface/accelerate/default_config.yaml is: compute_environment: LOCAL_MACHINE deepspeed_config: {} distributed_type: MULTI_GPU downcast_bf16: 'no' dynamo_config: {} fsdp_config: …

balcklive updated 1 year ago
2
baichuan-inc/Baichuan2 #13

对齐的框架和数据

看了论文，baichuan2 chat版本做了rlhf流程，采集了类似于hh_rlhf的数据，请问有开源rlhf数据和训练框架的计划吗？或者可以先开源一部分reward model训练数据？

skepsun updated 10 months ago
1
zchen0420/nn_papers #3

LoRA

# LoRA: Low-Rank Adaptation of Large Language Models 基于large pre-trained model，把基于某个任务的微调存储在低秩矩阵对中，low intrinsic dimension $r=4$ 就够。 Pro: - 并行化不影响速度、任务特化的信息相对很少。 - 该方法对超参数极其不敏感。另外： - 对于模型…

zchen0420 updated 1 hour ago
4
microsoft/DeepSpeedExamples #456

enable_hybrid_engine issue

Error Info: File "/data/rooter_use/conda/envs/llama-env39/lib/python3.9/site-packages/deepspeed/runtime/hybrid_engine.py", line 398, in step actor_loss, critic_loss = trainer.train_rlhf(exp_da…

llllooong updated 10 months ago
9
modelscope/swift #570

断点续训时，显存要求增大了？

用swift微调qwen1.5-14B时，初始运行很正常，但是断点续训后，报错了，报错信息如下 ```bash [INFO:swift] Setting model.config.use_cache: False [WARNING:modelscope] Reusing dataset dataset_builder (/home/devops/.cache/modelscope/hub/d…

kratorado updated 1 week ago
3
kibitzing/awesome-llm-data #2

LLaMa 2 Fine-tuning data

### SFT data 1. Started the SFT stage with publicly available instruction tuning data ([Chung et al., 2022](https://arxiv.org/pdf/2210.11416)) 2. Fewer but high quality > Millions of data but low …

kibitzing updated 2 weeks ago
1
vllm-project/vllm #4068

[Feature]: Allow LoRA adapters to be specified as in-memory …

### 🚀 The feature, motivation and pitch PPO and a number of other LLM fine-tuning techniques require autoregressive generation as part of the training process. When using vLLM to speed up the autor…

jacobthebanana updated 1 month ago
4
jackaduma/Vicuna-LoRA-RLHF-PyTorch #10

supervised_finetune.py failed with a wordaround

(gh_Vicuna-LoRA-RLHF-PyTorch) amd00@asus00:~/llm_dev/Vicuna-LoRA-RLHF-PyTorch$ python supervised_finetune.py --data_path './data/merge_sample.json' --output_path 'lora-Vicuna' --model_path './weight…

SeekPoint updated 1 year ago
1
mccaffary/GPT-4-ChatGPT-Project-Euler #1

Benchmark against _current_ GPT-4, which appears to have bee…

(presumably due to model distillation or RLHF...)

mccaffary updated 9 months ago
1
OpenLMLab/MOSS-RLHF #24

关于中文reward-model参数合并的问题

感谢作者无私开源，看到官方README里说中文的reward-model是基于open-chinese-llama-7b做的，但是后面的步骤说明里写的是：python merge_weight_zh.py recover --path_raw decapoda-research/llama-7b-hf --path_diff ./models/moss-rlhf-reward-model-7B-z…

hannlp updated 4 months ago
4

上一页 1...5 6 7 8 9 10 11...100 下一页

1000+ results for rlhf

1000+ results
for rlhf