Total train epochs 10 | Total train iters 286497 |
building Enc-Dec model ...
number of parameters on model parallel rank 1: 5543798784
number of parameters on model parallel rank 0: 5543798784
Traceback (most recent call last):
File "/mnt/finetune_cpm2.py", line 808, in
main()
File "/mnt/finetune_cpm2.py", line 791, in main
model, optimizer, lr_scheduler = setup_model_and_optimizer(args, tokenizer.vocab_size, ds_config, prompt_config)
File "/mnt/utils.py", line 213, in setup_model_and_optimizer
optimizer = get_optimizer(model, args, prompt_config)
File "/mnt/utils.py", line 163, in get_optimizer
optimizer = Adam(param_groups,
File "/opt/conda/lib/python3.8/site-packages/apex/optimizers/fused_adam.py", line 79, in init
raise RuntimeError('apex.optimizers.FusedAdam requires cuda extensions')
RuntimeError: apex.optimizers.FusedAdam requires cuda extensions
您好,我尝试在2张NVIDIA A100-PCIE-40GB的卡上跑代码,直接使用了镜像环境。但是一直在加载FusedAdam时报以下错误,即使重装了apex也没解决,目前还没有找到解决办法:
Total train epochs 10 | Total train iters 286497 | building Enc-Dec model ...
请问是否可以在2张NVIDIA A100-PCIE-40GB的卡上跑?镜像中apex环境需要调整什么吗?感谢。