Closed LangDaoAI closed 1 year ago
Hi @LangDaoAI,
Thanks for your attention!
The code will be released soon (within this week). Please stay tuned!
Best, Zhihong
Hi @LangDaoAI,
Thanks for your attention!
The code will be released soon (within this week). Please stay tuned!
Best, Zhihong
thanks reply and hard work! I will close the issue
@zhjohnchan
Hi , Zhihong, has any progress on the issue?
Hi @LangDaoAI,
Thanks for your attention!
We have uploaded the training code (see here). Now you can train Phoenix
. :-)
Best, Zhihong
Hi @LangDaoAI,
Thanks for your attention!
We have uploaded the training code (see here). Now you can train
Phoenix
. :-)Best, Zhihong
@zhjohnchan awasome !
A quick question as following:
If I want to use lora , option parameter in the following script should be ?
Hi @LangDaoAI,
We also support LoRA. :-)
Just add a parameter --lora True
to use LoRA. In the meantime, you can turn off fsdp
and increase the batch size since the GPU memory requirement is low for LoRA.
Best, Zhihong
Hi @LangDaoAI,
We also support LoRA. :-) Just add a parameter
--lora True
to use LoRA.Best, Zhihong
Sure, thanks!
@zhjohnchan a final question: Is there any group/way to communicate with? e.g. wechat
Hi @LangDaoAI,
I have updated the code for lora training. Please pull the latest code.
The following is an example to use LoRA:
torchrun \
--nnodes=1 \
--nproc_per_node=8 \
--master_port=12375 \
train.py \
--model_name_or_path ${model_name_or_path} \
--model_max_length ${model_max_length} \
--data_path ${data_path} \
--output_dir ${output_dir} \
--bf16 True \
--num_train_epochs 3 \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 32 \
--save_strategy "steps" \
--save_steps 500 \
--evaluation_strategy "no" \
--save_total_limit 3 \
--learning_rate 2e-5 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--tf32 True \
--gradient_checkpointing False \
--ddp_find_unused_parameters False \
--lora True
Best, Zhihong
Hi @LangDaoAI,
I have updated the code for lora training. Please pull the latest code.
The following is an example to use LoRA:
torchrun \ --nnodes=1 \ --nproc_per_node=8 \ --master_port=12375 \ train.py \ --model_name_or_path ${model_name_or_path} \ --model_max_length ${model_max_length} \ --data_path ${data_path} \ --output_dir ${output_dir} \ --bf16 True \ --num_train_epochs 3 \ --per_device_train_batch_size 1 \ --per_device_eval_batch_size 1 \ --gradient_accumulation_steps 32 \ --save_strategy "steps" \ --save_steps 500 \ --evaluation_strategy "no" \ --save_total_limit 3 \ --learning_rate 2e-5 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --tf32 True \ --gradient_checkpointing False \ --ddp_find_unused_parameters False \ --lora True
Best, Zhihong
Thanks hard working! @zhjohnchan Team有交流的微信群/slack或者其他吗, 希望可以加入进一步交流!
Team,
When to release training code(pretraining、instruct fine-tuning、RLHF)?
Thanks!