Open aldialimucaj opened 2 months ago
I did a small fine-tuning and the process finishes correctly. The output model is too small though and there are no weights. The list of the files is this
config.json generation_config.json model.safetensors (around 250 MiB) runs/ special_tokens_map.json tokenizer.json tokenizer_config.json trainer_state.json training_args.bin
Im using the same command that you suggest: deepspeed finetune_deepseekcoder.py \ --model_name_or_path $MODEL_PATH \ --data_path $DATA_PATH \ --output_dir $OUTPUT_PATH \ --num_train_epochs 3 \ --model_max_length 1024 \ --per_device_train_batch_size 16 \ --per_device_eval_batch_size 1 \ --gradient_accumulation_steps 4 \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_steps 100 \ --save_total_limit 100 \ --learning_rate 2e-5 \ --warmup_steps 10 \ --logging_steps 1 \ --lr_scheduler_type "cosine" \ --gradient_checkpointing True \ --report_to "tensorboard" \ --deepspeed configs/ds_config_zero3.json \ --bf16 True
Could you also give an example on how to use the output model?
If you use deepspeed stage3 to finetune the model, the weights you got are incomplete. My solution is to set “stage3_gather_16bit” to false.
I probably ran into the same problem, any solution to that?
I did a small fine-tuning and the process finishes correctly. The output model is too small though and there are no weights. The list of the files is this
config.json generation_config.json model.safetensors (around 250 MiB) runs/ special_tokens_map.json tokenizer.json tokenizer_config.json trainer_state.json training_args.bin
Im using the same command that you suggest: deepspeed finetune_deepseekcoder.py \ --model_name_or_path $MODEL_PATH \ --data_path $DATA_PATH \ --output_dir $OUTPUT_PATH \ --num_train_epochs 3 \ --model_max_length 1024 \ --per_device_train_batch_size 16 \ --per_device_eval_batch_size 1 \ --gradient_accumulation_steps 4 \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_steps 100 \ --save_total_limit 100 \ --learning_rate 2e-5 \ --warmup_steps 10 \ --logging_steps 1 \ --lr_scheduler_type "cosine" \ --gradient_checkpointing True \ --report_to "tensorboard" \ --deepspeed configs/ds_config_zero3.json \ --bf16 True
Could you also give an example on how to use the output model?