KyujinHan / Sakura-SOLAR-DPO

Sakura-SOLAR-DPO: Merge, SFT, and DPO
115 stars 7 forks source link

when ref model =None no effect. #4

Closed Minami-su closed 9 months ago

Minami-su commented 9 months ago

I trained my model, set the reference model to None, and observed no changes in the trained model compared to the original model during inference.

python Sakura_DPO.py \
    --base_model Qwen-14B-Chat \
    --data-path  distilabel-intel-orca-dpo-pairs.json \
    --output_dir distilabel-intel-orca-dpo-pairs \
    --num_epochs 1 \
    --batch_size 16 \
    --micro_batch_size 1 \
    --learning_rate 1e-6 \
    --lora_r 32 \
    --lora_alpha 32 \
    --lora_dropout 0.05 \
    --lr_scheduler 'linear' \
    --warmup_ratio 0.1 \
    --cutoff_len 768
KyujinHan commented 9 months ago

ref_model=None means: This option is set when the model you want to train is the same as the model you froze.

This has the effect of reducing GPU allocation.

So, I think, ref=None model option is not the cause..

Minami-su commented 9 months ago

ref_model=None means: This option is set when the model you want to train is the same as the model you froze.

This has the effect of reducing GPU allocation.

So, I think, ref=None model option is not the cause..

Okay, maybe there's something wrong with the code in the qwen model

KyujinHan commented 9 months ago

Let me know if you have any other information to share!