For efficiency, we first train SEED-LLaMA using LoRA [32] tuning and together optimize the
parameters of the embedding layer and decoder head layer due to the added visual codes. We then
merge the parameters of LoRA onto the LLM backbone and fine-tune all parameters except for
the embedding layer.
But in the training steps, the part about fine-tuning all parameters except for the embedding layer is missing.
Hi,
For the paper https://arxiv.org/pdf/2310.01218.pdf , the following is mentioned in pretraining section :
But in the training steps, the part about
fine-tuning all parameters except for the embedding layer
is missing.