-
各位大佬,对chatglm3进行预训练运行pretraining.py时报错:
RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
想问一下怎么解决啊
每次都是训练到134步时报错,之前训练都是…
-
Traceback (most recent call last):
File "F:\xiazai\MedicalGPT-main\pretraining.py", line 781, in
main()
File "F:\xiazai\MedicalGPT-main\pretraining.py", line 722, in main
trainer = Sa…
-
Traceback (most recent call last):
File "F:\xiazai\MedicalGPT-main\reward_modeling.py", line 653, in
main()
File "F:\xiazai\MedicalGPT-main\reward_modeling.py", line 447, in main
mode…
-
### Describe the Question
Please provide a clear and concise description of what the question is.
大佬可以提供一个关于从预训练到SFT再到RLHF的各个阶段训练即推理的例子吗,把这几个串一下,比如预训练后,推理测试,感觉ok后,再进入SFT阶段,完后再推理测试,以此类推,这样有利于大家一起来讨论这…
-
在训练PPO的时候出现的这个问题如何解决,切换了peft不同的版本也没用。
-
什么都没改执行结果如下:
(songyao_MedicalGPT) root@jkha-W580-G20:/software/MedicalGPT# CUDA_VISIBLE_DEVICES=0,1 python orpo_training.py \
--model_type auto \
--model_name_or_path Qwen/Qwen1.5-0.5B-Chat…
-
### Describe the Question
Please provide a clear and concise description of what the question is.
使用如下命令:
CUDA_VISIBLE_DEVICES=0,1,2,3 python pretraining.py \
--model_type chatglm \
--mod…
-
from transformers import Trainer
from transformers.trainer import TRAINING_ARGS_NAME
import torch
from typing import Any, List, Union, Optional, Dict
import os
from torch.utils.data import Datase…
-
学习率为零,后面的chosen和rejected全为none
-
### Describe the bug
![Xnip2024-04-27_10-02-44](https://github.com/shibing624/MedicalGPT/assets/7911286/9f5b4130-3dda-4cb7-a3f6-360124858815)
![Xnip2024-04-27_10-03-48](https://github.com/shibing624…