"<image>" is absent in "llava_instruct_150k_zh.jsonl"

InternLM / xtuner

An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)

https://xtuner.readthedocs.io/zh-cn/latest/

Apache License 2.0

3.39k stars 274 forks source link

"<image>" is absent in "llava_instruct_150k_zh.jsonl" #771

Open wusize opened 1 month ago

wusize commented 1 month ago

Hi,

I noticed that llava_instruct_150k_zh.jsonl is used in the config that fine-tunes phi3-based llava using datasets from internvl. However, I found the special token <image> is missing from this jsonl file. In the current llava pipeline, image embeddings won't be inserted into the input sequence of LLM if this special token is absent.

hhaAndroid commented 1 month ago

@wusize This is normal, as we can support both text-only training as well as training with text and images.