-
8卡机器,使用4,5,6,7这4块显卡;
默认安装的opendelta == 0.3.0
torch==2.0.0
transformers==4.28.1
全参数微调时报错如下:
TypeError: CheckpointBlock._named_members() got an unexpected keyword argument 'remove_duplicate'
ERR…
-
**Run:**
`bash dist_trigger_docker.sh hostfile Aquila-chat.yaml aquila-7b aquila_experiment`
Error:
```
[INFO] bmtrain_mgpu.sh: hostfile configfile model_name exp_name exp_version
bmtrain_mgpu.s…
-
/home/hw/miniconda3/envs/D-Bot/lib/python3.10/site-packages/bmtrain/synchronize.py:14: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as…
-
### System Info
OS macos
flagai 1.8.2
transformers 4.34.0
错误如下:
[2023-10-13 13:25:52,131] [INFO] [logger.py:85:log_dist] [Rank -1] Unsupported bmtrain
Loading checkpo…
gaord updated
8 months ago
-
### Is your feature request related to a problem? Please describe.
torch 2.0 is stable
### Describe the solution you'd like
torch 2.0 is stable
### Describe alternatives you've considered
_No res…
-
### Description
基于模型Aquila-7B在
FlagAI/examples/Aquila/Aquila-pretrain 进行预训练
环境:单机2卡,已经设置hostfile为10.2.170.111 slots=2
使用命令:bash local_trigger_docker.sh hostfile Aquila-pretrain.yaml Aquila-7B aq…
-
### Description
模型加载太慢了,花费4分钟去加载模型。
### Alternatives
[2023-07-05 14:14:27,615] [INFO] [logger.py:85:log_dist] [Rank -1] Unsupported bmtrain
******************** lm aquila-7b
***************use ca…
-
PyTorch 1.13.1
CUDA Version: 11.2
Building wheel for bmtrain (setup.py) ... error
ERROR: Command errored out with exit status 1:
command: /data/private/hebingxiang/miniconda3/bin/python -…
-
cuda11.7 ,torch==1.13.1,ubuntu22.04版本下安装失败,这个该怎么解决,是版本适配的问题吗
Collecting bmtrain
Downloading bmtrain-0.2.2.tar.gz (58 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.7/58.7 kB 432.6 kB/s eta 0…
-
(cpm) D:\GitHub\BMTrain>python setup.py install
running install
C:\ProgramData\Anaconda3\envs\cpm\lib\site-packages\setuptools\command\install.py:37: SetuptoolsDeprecationWarning: setup.py install i…