OpenMOSS / MOSS

An open-source tool-augmented conversational language model from Fudan University
https://txsun1997.github.io/blogs/moss.html
Apache License 2.0
11.89k stars 1.15k forks source link

使用fp8 后微调速度特别慢 #355

Open lhtpluto opened 1 year ago

lhtpluto commented 1 year ago

finetune_moss.py 中修改如下 accelerator = Accelerator(mixed_precision='fp8')

环境用的nvidia的容器 nvcr.io/nvidia/pytorch:23.06-py3 https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch

因计算卡显存不足,DeepSpeed offload cpu

修改 sft.yaml 如下

command_file: null commands: null compute_environment: LOCAL_MACHINE deepspeed_config: gradient_accumulation_steps: 1 gradient_clipping: 1.0 offload_optimizer_device: cpu offload_param_device: cpu zero3_init_flag: true zero3_save_16bit_model: true zero_stage: 3 distributed_type: DEEPSPEED downcast_bf16: 'no' dynamo_backend: 'NO' fsdp_config: {} gpu_ids: null machine_rank: 0 main_process_ip: null main_process_port: null main_training_function: main megatron_lm_config: {} mixed_precision: fp8 num_machines: 1 num_processes: 1 rdzv_backend: static same_network: true tpu_name: null tpu_zone: null use_cpu: false

我设置fp8格式微调后,训练速度变慢,是怎么回事呢?

DeepSpeed v0.9.5 FP8 unittest for H100 by @jomayeri in https://github.com/microsoft/DeepSpeed/pull/3731

难道是DeepSpeed offload cpu 后,cpu不支持fp8导致的? 我的cpu是Intel® Xeon® w9-3495X Processor