NielsRogge / Transformers-Tutorials

This repository contains demos I made with the Transformers library by HuggingFace.
MIT License
9.56k stars 1.46k forks source link

LLaVa-NeXT/Fine_tune_LLaVaNeXT_on_a_custom_dataset_(with_PyTorch_Lightning).ipynb fails at training #461

Open 7AtAri opened 3 months ago

7AtAri commented 3 months ago

ValueError Traceback (most recent call last) in <cell line: 19>() 17 ) 18 ---> 19 trainer.fit(model_module)

24 frames /usr/local/lib/python3.10/dist-packages/transformers/models/llava_next/modeling_llava_next.py in _merge_input_ids_with_image_features(self, image_features, feature_lens, inputs_embeds, input_ids, attention_mask, position_ids, labels, image_token_index, ignore_index) 541 total_num_special_image_tokens = torch.sum(special_image_token_mask) 542 if total_num_special_image_tokens != num_images: --> 543 raise ValueError( 544 f"Number of image tokens in input_ids ({total_num_special_image_tokens}) different from num_images ({num_images})." 545 )

ValueError: Number of image tokens in input_ids (0) different from num_images (1).


this error appears only after fixing another error concerning the chat_template:

in the collate functions: chat_template = ( "{% if messages[0]['role'] == 'instruction' %}" "Instruction: {{- messages[0]['content'] }}\n" "{% set messages = messages[1:] %}" "{% endif %}" "{% for message in messages %}" "Question:" "{% for line in message['query'] %}" "{% if line['type'] == 'text' %}" "{{- line['text'] }}" "{% elif line['type'] == 'image' %}" "{{ '' }}" "{% endif %}" "{% endfor %}" "\n" "{% if 'answer' in message %}" "Short answer: " "{% for line in message['answer'] %}" "{% if line['type'] == 'text' %}" "{{- line['text'] }}" "{% elif line['type'] == 'image' %}" "{{ '' }}" "{% endif %}" "{% endfor %}" "\n" "{% endif %}" "\n" "{% endfor %}" "{% if add_generation_prompt %}" "Short answer: " "{% endif %}" )

text_prompt = processor.tokenizer.apply_chat_template(conversation, chat_template=chat_template, add_generation_prompt=True)

https://github.com/huggingface/transformers/issues/32303