Alpha-VLLM / LLaMA2-Accessory

An Open-source Toolkit for LLM Development
https://llama2-accessory.readthedocs.io/
Other
2.62k stars 167 forks source link

Some Question about SPHINX. #102

Closed shipengai closed 7 months ago

shipengai commented 7 months ago
image
  1. There are error in "The text-only loss corresponds to training only on training only RefinedWeb", double " training only "
  2. which dataset is used when "text-only loss, w/o RefinedWeb"
  3. Why says"We observe that the text- only loss grows if the model is not trained with RefinedWeb, showing that our joint-training scheme is effective in preserving the text-modeling capability while adapting for cross-modal understanding."
shipengai commented 7 months ago

另外,论文中,没有Mixed visual embeddings和Mixed model weights等方法的性能提升对比实验。有计划放出这块对比吗?

gaopengpjlab commented 7 months ago

This repo show the effectiveness of Mixed model weights for LLM and MLLM finetune. https://github.com/Alpha-VLLM/WeMix-LLM

Comprehensive sblation studies of Mixed visual embeddings and Mixed model weights will be updated soon.

linziyi96 commented 7 months ago

We are sorry that this writing error is found after the paper is made public on arXiv. To clarify the 4 curves:

  1. text-only loss, w/ RefinedWeb: The model is jointly trained on LAION-400M and RefinedWeb. The value on the curve is the next token prediction loss on RefinedWeb.
  2. text-only loss, w/o RefinedWeb: The model is trained only on LAION-400M. The value on the curve is the next token prediction loss on RefinedWeb (In this case RefinedWeb is only used for calculating the loss but not for model optimization).
  3. image-caption loss, w/ RefinedWeb: The model is jointly trained on LAION-400M and RefinedWeb. The value on the curve is the next token prediction loss on the captions of LAION-400M.
  4. image-caption loss, w/o RefinedWeb: The model is trained only on LAION-400M. The value on the curve is the next token prediction loss on the captions of LAION-400M.

The purpose of this experiment is to show the benefits of jointly training on a text-only dataset (RefinedWeb in this case) in addition to the image captioning dataset. The growth of the loss on RefinedWeb is meant to show the compromised text reasoning capability.

gaopengpjlab commented 7 months ago

@shipengai Please check the ablation study of SPHINX.

Screen Shot 2023-11-18 at 15 21 54