Open lhtpluto opened 1 year ago
finetune_moss.py 中修改如下 accelerator = Accelerator(mixed_precision='fp8')
环境用的nvidia的容器 nvcr.io/nvidia/pytorch:23.06-py3 https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch
因计算卡显存不足,DeepSpeed offload cpu
修改 sft.yaml 如下
command_file: null commands: null compute_environment: LOCAL_MACHINE deepspeed_config: gradient_accumulation_steps: 1 gradient_clipping: 1.0 offload_optimizer_device: cpu offload_param_device: cpu zero3_init_flag: true zero3_save_16bit_model: true zero_stage: 3 distributed_type: DEEPSPEED downcast_bf16: 'no' dynamo_backend: 'NO' fsdp_config: {} gpu_ids: null machine_rank: 0 main_process_ip: null main_process_port: null main_training_function: main megatron_lm_config: {} mixed_precision: fp8 num_machines: 1 num_processes: 1 rdzv_backend: static same_network: true tpu_name: null tpu_zone: null use_cpu: false
我设置fp8格式微调后,训练速度变慢,是怎么回事呢?
DeepSpeed v0.9.5 FP8 unittest for H100 by @jomayeri in https://github.com/microsoft/DeepSpeed/pull/3731
难道是DeepSpeed offload cpu 后,cpu不支持fp8导致的? 我的cpu是Intel® Xeon® w9-3495X Processor
finetune_moss.py 中修改如下 accelerator = Accelerator(mixed_precision='fp8')
环境用的nvidia的容器 nvcr.io/nvidia/pytorch:23.06-py3 https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch
因计算卡显存不足,DeepSpeed offload cpu
修改 sft.yaml 如下
command_file: null commands: null compute_environment: LOCAL_MACHINE deepspeed_config: gradient_accumulation_steps: 1 gradient_clipping: 1.0 offload_optimizer_device: cpu offload_param_device: cpu zero3_init_flag: true zero3_save_16bit_model: true zero_stage: 3 distributed_type: DEEPSPEED downcast_bf16: 'no' dynamo_backend: 'NO' fsdp_config: {} gpu_ids: null machine_rank: 0 main_process_ip: null main_process_port: null main_training_function: main megatron_lm_config: {} mixed_precision: fp8 num_machines: 1 num_processes: 1 rdzv_backend: static same_network: true tpu_name: null tpu_zone: null use_cpu: false
我设置fp8格式微调后,训练速度变慢,是怎么回事呢?
DeepSpeed v0.9.5 FP8 unittest for H100 by @jomayeri in https://github.com/microsoft/DeepSpeed/pull/3731
难道是DeepSpeed offload cpu 后,cpu不支持fp8导致的? 我的cpu是Intel® Xeon® w9-3495X Processor