Training Objective - Githubissues

Hi,

Thank you for your interest in our work! The training objective of KL dispersion and compressed model distillation are the ideas behind our construction of VoCo-LLaMA. In Equation 4 in the paper, our goal is to make the distribution of the output of VoCo-LLaMA approximate the distribution of the output of the original model $VLM_o$ (in this paper, we use LLaVA as an example). In terms of concrete implementation, we achieve this training paradigm by inserting VoCo token and modifying the attention mask. Thus our model only needs to be trained under the standard visual instruction tuning stage. The final loss and training objective are identical to those of LLaVA (visual instruction tuning).

By the way, the Matryoshka Query Transformer that you have proposed has interesting ideas. Best Regards,

Xubing

Yxxxb / VoCo-LLaMA

Training Objective #4