Ucas-HaoranWei / Vary-toy

Official code implementation of Vary-toy (Small Language Model Meets with Reinforced Vision Vocabulary)
565 stars 41 forks source link

How to fine-tune Vary-tiny with LoRA or SFT? #2

Closed parap1uie-s closed 5 months ago

parap1uie-s commented 5 months ago

Same as title, can I re-train / fine-tune Vary-tiny (the new vision vocab) with SFT or LoRA on my private dataset?

Thanks in advance.

Ucas-HaoranWei commented 5 months ago

We do not release the Vary-tiny weight, you have two ways:

  1. extract the new vocab weight from Vary-toy and use a new opt125M
  2. if the demand for Vary-tiny is high, we will consider open-source the weight
parap1uie-s commented 5 months ago

We do not release the Vary-tiny weight, you have two ways:

  1. extract the new vocab weight from Vary-toy and use a new opt125M
  2. if the demand for Vary-tiny is high, we will consider open-source the weight

It would be great if you check the issue in vary repo.

The official vary-tiny weight is not necessary for us, and we do have a plan to re-train vary-tiny following method 1.

However, the question is about a new opt125M. The official opt-125M has max_position_embeddings = 2048 which is not compatible with vary.

So, what should I do if I need to modify the official opt-125M? Or any vary-version opt-125m have released?

And, thank you for your innovative work!

Ucas-HaoranWei commented 5 months ago
  1. If you do not need so long max_length for your task, reduce the max_length when training Vary-tiny.
  2. If you need 4096 length opt-125M, you can interpolate the official opt-125M to 4096.