Closed cydiachen closed 8 months ago
Hi, we upload the code of Qwen1.5. I post some key codes here.
https://github.com/PKU-YuanGroup/MoE-LLaVA/blob/main/moellava/model/language_model/llava_qwen1_5_moe.py#L452-L472 https://github.com/PKU-YuanGroup/MoE-LLaVA/blob/main/moellava/train/train.py#L1401-L1409 https://github.com/PKU-YuanGroup/MoE-LLaVA/blob/main/moellava/model/builder.py#L382-L399
By the way, if you use qwen2
, it's extremely easy to get confused with qwen
. So I add 1.5
to the model name to DOUBLE CHECK.
Therefore, please include qwen
and 1.5
simultaneously in --output_dir
.
Thank you for the code release. I have successfully pretrained and finetuned the model. But another error occurred when evaluating on TextVQA? I have tried H800 and A100 to avoid any hardware error. But they came up with the same error. I used to meet this error on my implementation. But I get rid of it with some penalty settings as Qwen1, which is attached in my original issue.
moellava/eval/model_vqa_loader.py", line 113, in eval_model
output_ids = model.generate(
File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/utils.py", line 1474, in generate
return self.greedy_search(
File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/utils.py", line 2348, in greedy_search
next_tokens_scores = logits_processor(input_ids, next_token_logits)
File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/logits_process.py", line 97, in __call__
scores = processor(input_ids, scores)
File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/logits_process.py", line 336, in __call__
score = torch.where(score < 0, score * self.penalty, score / self.penalty)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Only error in TextVQA? What about SciQA or GQA?
Only error in TextVQA? What about SciQA or GQA?
GQA shares the same error.
model_vqa_loader.py", line 113, in eval_model
output_ids = model.generate(
File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/utils.py", line 1474, in generate
return self.greedy_search(
File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/utils.py", line 2348, in greedy_search
next_tokens_scores = logits_processor(input_ids, next_token_logits)
File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/logits_process.py", line 97, in __call__
scores = processor(input_ids, scores)
File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/logits_process.py", line 336, in __call__
score = torch.where(score < 0, score * self.penalty, score / self.penalty)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [31,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
Sorry, I re-pull the main branch code and test it locally. It works well without any errors. Could you check is the latest code?
Sorry, I re-pull the main branch code and test it locally. It works well without any errors. Could you check is the latest code?
I have validate the code version. It seems that it is the latest code. I will retry the code again. Moreover, I am validating the model after stage-2, which is not a MoE Model. Would that be a cause of raising unexpected error?
Additional Information:
{
"_name_or_path": "Qwen1.5-1.8B-Chat",
"architectures": [
"LlavaQwen1_5ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151643,
"freeze_mm_mlp_adapter": false,
"hidden_act": "silu",
"hidden_size": 2048,
"image_aspect_ratio": "pad",
"image_projector_type": "mlp2x_gelu",
"initializer_range": 0.02,
"intermediate_size": 5504,
"max_position_embeddings": 32768,
"max_window_layers": 21,
"mm_hidden_size": 1024,
"mm_image_tower": "openai-clip-vit-large-patch14-336",
"mm_projector_lr": null,
"mm_use_im_patch_token": false,
"mm_use_im_start_end": false,
"mm_video_tower": null,
"mm_vision_select_feature": "patch",
"mm_vision_select_layer": -2,
"model_type": "llava_qwen1_5",
"num_attention_heads": 16,
"num_hidden_layers": 24,
"num_key_value_heads": 16,
"pad_token_id": 151646,
"rms_norm_eps": 1e-06,
"rope_theta": 1000000.0,
"sliding_window": 32768,
"tie_word_embeddings": false,
"tokenizer_padding_side": "right",
"torch_dtype": "bfloat16",
"transformers_version": "4.37.0",
"tune_mm_mlp_adapter": false,
"use_cache": true,
"use_mm_proj": true,
"use_sliding_window": false,
"video_global_proj": false,
"video_projector_type": "linear",
"video_spatial_proj": false,
"video_temproal_proj": false,
"vocab_size": 151936
}
- This IS expected if you are initializing LlavaQwen1_5ForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing LlavaQwen1_5ForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
/root/anaconda3/envs/llava/lib/python3.10/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.__get__(instance, owner)()
0%| | 0/5000 [00:00<?, ?it/s]/root/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:392: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.0` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
warnings.warn(
/root/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:397: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `None` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.
warnings.warn(
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:151645 for open-end generation.
0%| | 0/5000 [00:01<?, ?it/s]
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [31,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
In order to support latest NVIDIA-GPUs. The main differences lies in the version of pytorch. My pytorch is 2.1.0, which is supported by CUDA12.1. Will that cause the problem?
Sorry, I re-pull the main branch code and test it locally. It works well without any errors. Could you check is the latest code?
I might found the error. The QWEN-1.5 corrected its tokenizer config on 9th of Feb. My init Qwen model is Downloaded on 7th of Feb, which has a mismatched tokenizer config.
I think so. And does it work now?
I think so. And does it work now?
Trying to retrain the network. ETA about 6 hours to finish. I will keep you with latest status.
I think so. And does it work now? Unfortunately, still generate error. It seems that the only way is to check the pytorch version. This is a relatively strange error.
- This IS expected if you are initializing LlavaQwen1_5ForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing LlavaQwen1_5ForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). /root/anaconda3/envs/llava/lib/python3.10/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() return self.fget.__get__(instance, owner)() 0%| | 0/5000 [00:00<?, ?it/s]/root/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:392: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.0` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`. warnings.warn( /root/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:397: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `None` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`. warnings.warn( The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results. Setting `pad_token_id` to `eos_token_id`:151645 for open-end generation. 0%| | 0/5000 [00:01<?, ?it/s] ../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [31,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed. Traceback (most recent call last): File "/workspace/mount/code/VLM/MoE-LLaVA/moellava/eval/model_vqa_loader.py", line 174, in <module> eval_model(args) File "/workspace/mount/code/VLM/MoE-LLaVA/moellava/eval/model_vqa_loader.py", line 113, in eval_model output_ids = model.generate( File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/utils.py", line 1474, in generate return self.greedy_search( File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/utils.py", line 2348, in greedy_search next_tokens_scores = logits_processor(input_ids, next_token_logits) File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/logits_process.py", line 97, in __call__ scores = processor(input_ids, scores) File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/logits_process.py", line 336, in __call__ score = torch.where(score < 0, score * self.penalty, score / self.penalty) RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Could you share you config.json or tokenizer.json and others. I use your file to inference to checkout whether the env problem or not.
Could you share you config.json or tokenizer.json and others. I use your file to inference to checkout whether the env problem or not.
added_tokens.json
{
"<|endoftext|>": 151643,
"<|extra_0|>": 151646,
"<|im_end|>": 151645,
"<|im_start|>": 151644
}
config.json
{
"_name_or_path": "model_zoo/Qwen1.5-1.8B-Chat",
"architectures": [
"LlavaQwen1_5ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151645,
"freeze_mm_mlp_adapter": false,
"hidden_act": "silu",
"hidden_size": 2048,
"image_aspect_ratio": "pad",
"image_projector_type": "mlp2x_gelu",
"initializer_range": 0.02,
"intermediate_size": 5504,
"max_position_embeddings": 32768,
"max_window_layers": 21,
"mm_hidden_size": 1024,
"mm_image_tower": "model_zoo/openai-clip-vit-large-patch14-336",
"mm_projector_lr": null,
"mm_use_im_patch_token": false,
"mm_use_im_start_end": false,
"mm_video_tower": null,
"mm_vision_select_feature": "patch",
"mm_vision_select_layer": -2,
"model_type": "llava_qwen1_5",
"num_attention_heads": 16,
"num_hidden_layers": 24,
"num_key_value_heads": 16,
"pad_token_id": 151646,
"rms_norm_eps": 1e-06,
"rope_theta": 1000000.0,
"sliding_window": 32768,
"tie_word_embeddings": false,
"tokenizer_padding_side": "right",
"torch_dtype": "bfloat16",
"transformers_version": "4.37.0",
"tune_mm_mlp_adapter": false,
"use_cache": false,
"use_mm_proj": true,
"use_sliding_window": false,
"video_global_proj": false,
"video_projector_type": "linear",
"video_spatial_proj": false,
"video_temproal_proj": false,
"vocab_size": 151936
}
generation_config.json
{
"bos_token_id": 151643,
"do_sample": true,
"eos_token_id": [
151645,
151643
],
"repetition_penalty": 1.1,
"top_p": 0.8,
"transformers_version": "4.37.0"
}
tokenizer_config.json
{
"add_prefix_space": false,
"added_tokens_decoder": {
"151643": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151644": {
"content": "<|im_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151645": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151646": {
"content": "<|extra_0|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
}
},
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>"
],
"bos_token": null,
"chat_template": "{% for message in messages %}{% if loop.first and messages[0]['role'] != 'system' %}{{ '<|im_start|>system\nYou are a helpful assistant<|im_end|>\n' }}{% endif %}{{'<|im_start|>' + message['role'] + '\n' + message['content']}}{% if (loop.last and add_generation_prompt) or not loop.last %}{{ '<|im_end|>' + '\n'}}{% endif %}{% endfor %}{% if add_generation_prompt and messages[-1]['role'] != 'assistant' %}{{ '<|im_start|>assistant\n' }}{% endif %}",
"clean_up_tokenization_spaces": false,
"eos_token": "<|endoftext|>",
"errors": "replace",
"model_max_length": 2048,
"pad_token": "<|extra_0|>",
"padding_side": "right",
"split_special_tokens": false,
"tokenizer_class": "Qwen2Tokenizer",
"unk_token": "<|extra_0|>"
}
special_token_map.json
{
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>"
],
"eos_token": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": "<|extra_0|>",
"unk_token": {
"content": "<|extra_0|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}
It's very different from me in these places. Can you use Qwen/Qwen1.5-1.8B
? Passing --save_steps 5
in the run command to quickly check if these files have changed.
Thanks a lot. I will try your solution. I checked the Qwen1.5-1.5B without chat version. It is the same with you. I will try it. I will keep update the result to you.
```shell RuntimeError: The size of tensor a (654) must match the size of tensor b (1307) at non-singleton dimension 3
May I consult you for detailed instructions? Or could we work together to enhance the performance? I reviewed the code and suspect the reason may be the incapability of padding='right', which is not supported by qwen1.5-flash-atten. The problem demonstrated in the official implementation.
Solved. It seems that the error lies in chat version of QWen1.5. Thx a lot
Discussion
Firstly, Wish you have a nice day on Chinese New Year. I am currently catching up with your progress in integrating Qwen1.5 to this project. Since the Qwen1.5 shares a similar structure with Qwen1 models, I followed the Qwen1 template to integrate the code. Currently, I have succeed in training and fine-tuning the model. But I came across the problem in evaluating the models on TextVQA.
builder.py
The other codes follows the Qwen1 settings.
Unfortunately, the code outputs the following error.
May I consult you for detailed instructions? Or could we work together to enhance the performance?
I reviewed the code and suspect the reason may be the incapability of padding='right', which is not supported by qwen1.5-flash-atten. The problem demonstrated in the official implementation.