-
my ~/.cache/huggingface/accelerate/default_config.yaml is:
compute_environment: LOCAL_MACHINE
deepspeed_config: {}
distributed_type: MULTI_GPU
downcast_bf16: 'no'
dynamo_config: {}
fsdp_config: …
-
看了论文,baichuan2 chat版本做了rlhf流程,采集了类似于hh_rlhf的数据,请问有开源rlhf数据和训练框架的计划吗?或者可以先开源一部分reward model训练数据?
-
# LoRA: Low-Rank Adaptation of Large Language Models
基于large pre-trained model,把基于某个任务的微调存储在低秩矩阵对中,low intrinsic dimension $r=4$ 就够。
Pro:
- 并行化不影响速度、任务特化的信息相对很少。
- 该方法对超参数极其不敏感。
另外:
- 对于模型…
-
Error Info:
File "/data/rooter_use/conda/envs/llama-env39/lib/python3.9/site-packages/deepspeed/runtime/hybrid_engine.py", line 398, in step
actor_loss, critic_loss = trainer.train_rlhf(exp_da…
-
用swift微调qwen1.5-14B时,初始运行很正常,但是断点续训后,报错了,报错信息如下
```bash
[INFO:swift] Setting model.config.use_cache: False
[WARNING:modelscope] Reusing dataset dataset_builder (/home/devops/.cache/modelscope/hub/d…
-
### SFT data
1. Started the SFT stage with publicly available instruction tuning data ([Chung et al., 2022](https://arxiv.org/pdf/2210.11416))
2. Fewer but high quality > Millions of data but low …
-
### 🚀 The feature, motivation and pitch
PPO and a number of other LLM fine-tuning techniques require autoregressive generation as part of the training process. When using vLLM to speed up the autor…
-
(gh_Vicuna-LoRA-RLHF-PyTorch) amd00@asus00:~/llm_dev/Vicuna-LoRA-RLHF-PyTorch$ python supervised_finetune.py --data_path './data/merge_sample.json' --output_path 'lora-Vicuna' --model_path './weight…
-
(presumably due to model distillation or RLHF...)
-
感谢作者无私开源,看到官方README里说中文的reward-model是基于open-chinese-llama-7b做的,但是后面的步骤说明里写的是:python merge_weight_zh.py recover --path_raw decapoda-research/llama-7b-hf --path_diff ./models/moss-rlhf-reward-model-7B-z…