[Discussion] Implementation of Qwen1.5 for the project

cydiachen commented 9 months ago

Discussion

Firstly, Wish you have a nice day on Chinese New Year. I am currently catching up with your progress in integrating Qwen1.5 to this project. Since the Qwen1.5 shares a similar structure with Qwen1 models, I followed the Qwen1 template to integrate the code. Currently, I have succeed in training and fine-tuning the model. But I came across the problem in evaluating the models on TextVQA.

builder.py

                if 'qwen2' in model_base.lower():
                    model = LlavaQwen2ForCausalLM.from_pretrained(model_base, low_cpu_mem_usage=True, config=lora_cfg_pretrained, **kwargs)
                    model.config.eos_token_id = tokenizer.eos_token_id
                    model.generation_config = GenerationConfig.from_pretrained(model_base, pad_token_id=tokenizer.pad_token_id)
                    # model.generation_config.repetition_penalty = None
                    model.generation_config.do_sample = False  # use greedy decoding
                    model.generation_config.repetition_penalty = 1.0  # disable repetition penalty

The other codes follows the Qwen1 settings.

Unfortunately, the code outputs the following error.

    attention_mask = _prepare_4d_causal_attention_mask(
  File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/modeling_attn_mask_utils.py", line 307, in _prepare_4d_causal_attention_mask
    attention_mask = attn_mask_converter.to_4d(
  File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/modeling_attn_mask_utils.py", line 137, in to_4d
    expanded_attn_mask = causal_4d_mask.masked_fill(expanded_attn_mask.bool(), torch.finfo(dtype).min)
RuntimeError: The size of tensor a (654) must match the size of tensor b (1307) at non-singleton dimension 3

May I consult you for detailed instructions? Or could we work together to enhance the performance?

I reviewed the code and suspect the reason may be the incapability of padding='right', which is not supported by qwen1.5-flash-atten. The problem demonstrated in the official implementation.

LinB203 commented 9 months ago

Hi, we upload the code of Qwen1.5. I post some key codes here.

https://github.com/PKU-YuanGroup/MoE-LLaVA/blob/main/moellava/model/language_model/llava_qwen1_5_moe.py#L452-L472 https://github.com/PKU-YuanGroup/MoE-LLaVA/blob/main/moellava/train/train.py#L1401-L1409 https://github.com/PKU-YuanGroup/MoE-LLaVA/blob/main/moellava/model/builder.py#L382-L399

LinB203 commented 9 months ago

By the way, if you use qwen2, it's extremely easy to get confused with qwen. So I add 1.5 to the model name to DOUBLE CHECK.

Therefore, please include qwen and 1.5 simultaneously in --output_dir.

cydiachen commented 9 months ago

Thank you for the code release. I have successfully pretrained and finetuned the model. But another error occurred when evaluating on TextVQA? I have tried H800 and A100 to avoid any hardware error. But they came up with the same error. I used to meet this error on my implementation. But I get rid of it with some penalty settings as Qwen1, which is attached in my original issue.

moellava/eval/model_vqa_loader.py", line 113, in eval_model
    output_ids = model.generate(
  File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/utils.py", line 1474, in generate
    return self.greedy_search(
  File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/utils.py", line 2348, in greedy_search
    next_tokens_scores = logits_processor(input_ids, next_token_logits)
  File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/logits_process.py", line 97, in __call__
    scores = processor(input_ids, scores)
  File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/logits_process.py", line 336, in __call__
    score = torch.where(score < 0, score * self.penalty, score / self.penalty)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

LinB203 commented 9 months ago

Only error in TextVQA? What about SciQA or GQA?

cydiachen commented 9 months ago

Only error in TextVQA? What about SciQA or GQA?

GQA shares the same error.

model_vqa_loader.py", line 113, in eval_model
    output_ids = model.generate(
  File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/utils.py", line 1474, in generate
    return self.greedy_search(
  File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/utils.py", line 2348, in greedy_search
    next_tokens_scores = logits_processor(input_ids, next_token_logits)
  File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/logits_process.py", line 97, in __call__
    scores = processor(input_ids, scores)
  File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/logits_process.py", line 336, in __call__
    score = torch.where(score < 0, score * self.penalty, score / self.penalty)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [31,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.

LinB203 commented 9 months ago

Sorry, I re-pull the main branch code and test it locally. It works well without any errors. Could you check is the latest code?

cydiachen commented 9 months ago

Sorry, I re-pull the main branch code and test it locally. It works well without any errors. Could you check is the latest code?

I have validate the code version. It seems that it is the latest code. I will retry the code again. Moreover, I am validating the model after stage-2, which is not a MoE Model. Would that be a cause of raising unexpected error?

cydiachen commented 9 months ago

Additional Information:

Stage-2 Config

{
"_name_or_path": "Qwen1.5-1.8B-Chat",
"architectures": [
"LlavaQwen1_5ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151643,
"freeze_mm_mlp_adapter": false,
"hidden_act": "silu",
"hidden_size": 2048,
"image_aspect_ratio": "pad",
"image_projector_type": "mlp2x_gelu",
"initializer_range": 0.02,
"intermediate_size": 5504,
"max_position_embeddings": 32768,
"max_window_layers": 21,
"mm_hidden_size": 1024,
"mm_image_tower": "openai-clip-vit-large-patch14-336",
"mm_projector_lr": null,
"mm_use_im_patch_token": false,
"mm_use_im_start_end": false,
"mm_video_tower": null,
"mm_vision_select_feature": "patch",
"mm_vision_select_layer": -2,
"model_type": "llava_qwen1_5",
"num_attention_heads": 16,
"num_hidden_layers": 24,
"num_key_value_heads": 16,
"pad_token_id": 151646,
"rms_norm_eps": 1e-06,
"rope_theta": 1000000.0,
"sliding_window": 32768,
"tie_word_embeddings": false,
"tokenizer_padding_side": "right",
"torch_dtype": "bfloat16",
"transformers_version": "4.37.0",
"tune_mm_mlp_adapter": false,
"use_cache": true,
"use_mm_proj": true,
"use_sliding_window": false,
"video_global_proj": false,
"video_projector_type": "linear",
"video_spatial_proj": false,
"video_temproal_proj": false,
"vocab_size": 151936
}

Detailed error infor before raising The error.

- This IS expected if you are initializing LlavaQwen1_5ForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing LlavaQwen1_5ForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
/root/anaconda3/envs/llava/lib/python3.10/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.__get__(instance, owner)()
0%|                                                                                                                                                                | 0/5000 [00:00<?, ?it/s]/root/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:392: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.0` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
warnings.warn(
/root/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:397: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `None` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.
warnings.warn(
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:151645 for open-end generation.
0%|                                                                                                                                                                | 0/5000 [00:01<?, ?it/s]
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [31,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.

cydiachen commented 9 months ago

In order to support latest NVIDIA-GPUs. The main differences lies in the version of pytorch. My pytorch is 2.1.0, which is supported by CUDA12.1. Will that cause the problem?

cydiachen commented 9 months ago

Sorry, I re-pull the main branch code and test it locally. It works well without any errors. Could you check is the latest code?

I might found the error. The QWEN-1.5 corrected its tokenizer config on 9th of Feb. My init Qwen model is Downloaded on 7th of Feb, which has a mismatched tokenizer config.

LinB203 commented 9 months ago

I think so. And does it work now?

cydiachen commented 9 months ago

I think so. And does it work now?

Trying to retrain the network. ETA about 6 hours to finish. I will keep you with latest status.

cydiachen commented 9 months ago

I think so. And does it work now? Unfortunately, still generate error. It seems that the only way is to check the pytorch version. This is a relatively strange error.

- This IS expected if you are initializing LlavaQwen1_5ForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing LlavaQwen1_5ForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
/root/anaconda3/envs/llava/lib/python3.10/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.__get__(instance, owner)()
0%|                                                                                                                                                                                                                 | 0/5000 [00:00<?, ?it/s]/root/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:392: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.0` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
warnings.warn(
/root/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:397: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `None` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.
warnings.warn(
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:151645 for open-end generation.
0%|                                                                                                                                                                                                                 | 0/5000 [00:01<?, ?it/s]
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [31,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
Traceback (most recent call last):
File "/workspace/mount/code/VLM/MoE-LLaVA/moellava/eval/model_vqa_loader.py", line 174, in <module>
eval_model(args)
File "/workspace/mount/code/VLM/MoE-LLaVA/moellava/eval/model_vqa_loader.py", line 113, in eval_model
output_ids = model.generate(
File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/utils.py", line 1474, in generate
return self.greedy_search(
File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/utils.py", line 2348, in greedy_search
next_tokens_scores = logits_processor(input_ids, next_token_logits)
File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/logits_process.py", line 97, in __call__
scores = processor(input_ids, scores)
File "/root/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/logits_process.py", line 336, in __call__
score = torch.where(score < 0, score * self.penalty, score / self.penalty)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

LinB203 commented 9 months ago

Could you share you config.json or tokenizer.json and others. I use your file to inference to checkout whether the env problem or not.

cydiachen commented 9 months ago

Could you share you config.json or tokenizer.json and others. I use your file to inference to checkout whether the env problem or not.

added_tokens.json

{
  "<|endoftext|>": 151643,
  "<|extra_0|>": 151646,
  "<|im_end|>": 151645,
  "<|im_start|>": 151644
}

config.json

{
  "_name_or_path": "model_zoo/Qwen1.5-1.8B-Chat",
  "architectures": [
    "LlavaQwen1_5ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "freeze_mm_mlp_adapter": false,
  "hidden_act": "silu",
  "hidden_size": 2048,
  "image_aspect_ratio": "pad",
  "image_projector_type": "mlp2x_gelu",
  "initializer_range": 0.02,
  "intermediate_size": 5504,
  "max_position_embeddings": 32768,
  "max_window_layers": 21,
  "mm_hidden_size": 1024,
  "mm_image_tower": "model_zoo/openai-clip-vit-large-patch14-336",
  "mm_projector_lr": null,
  "mm_use_im_patch_token": false,
  "mm_use_im_start_end": false,
  "mm_video_tower": null,
  "mm_vision_select_feature": "patch",
  "mm_vision_select_layer": -2,
  "model_type": "llava_qwen1_5",
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "num_key_value_heads": 16,
  "pad_token_id": 151646,
  "rms_norm_eps": 1e-06,
  "rope_theta": 1000000.0,
  "sliding_window": 32768,
  "tie_word_embeddings": false,
  "tokenizer_padding_side": "right",
  "torch_dtype": "bfloat16",
  "transformers_version": "4.37.0",
  "tune_mm_mlp_adapter": false,
  "use_cache": false,
  "use_mm_proj": true,
  "use_sliding_window": false,
  "video_global_proj": false,
  "video_projector_type": "linear",
  "video_spatial_proj": false,
  "video_temproal_proj": false,
  "vocab_size": 151936
}

generation_config.json

{
  "bos_token_id": 151643,
  "do_sample": true,
  "eos_token_id": [
    151645,
    151643
  ],
  "repetition_penalty": 1.1,
  "top_p": 0.8,
  "transformers_version": "4.37.0"
}

tokenizer_config.json

{
  "add_prefix_space": false,
  "added_tokens_decoder": {
    "151643": {
      "content": "<|endoftext|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151644": {
      "content": "<|im_start|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151645": {
      "content": "<|im_end|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151646": {
      "content": "<|extra_0|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    }
  },
  "additional_special_tokens": [
    "<|im_start|>",
    "<|im_end|>"
  ],
  "bos_token": null,
  "chat_template": "{% for message in messages %}{% if loop.first and messages[0]['role'] != 'system' %}{{ '<|im_start|>system\nYou are a helpful assistant<|im_end|>\n' }}{% endif %}{{'<|im_start|>' + message['role'] + '\n' + message['content']}}{% if (loop.last and add_generation_prompt) or not loop.last %}{{ '<|im_end|>' + '\n'}}{% endif %}{% endfor %}{% if add_generation_prompt and messages[-1]['role'] != 'assistant' %}{{ '<|im_start|>assistant\n' }}{% endif %}",
  "clean_up_tokenization_spaces": false,
  "eos_token": "<|endoftext|>",
  "errors": "replace",
  "model_max_length": 2048,
  "pad_token": "<|extra_0|>",
  "padding_side": "right",
  "split_special_tokens": false,
  "tokenizer_class": "Qwen2Tokenizer",
  "unk_token": "<|extra_0|>"
}

cydiachen commented 9 months ago

special_token_map.json

{
  "additional_special_tokens": [
    "<|im_start|>",
    "<|im_end|>"
  ],
  "eos_token": {
    "content": "<|im_end|>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "pad_token": "<|extra_0|>",
  "unk_token": {
    "content": "<|extra_0|>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  }
}

LinB203 commented 9 months ago

It's very different from me in these places. Can you use Qwen/Qwen1.5-1.8B? Passing --save_steps 5 in the run command to quickly check if these files have changed.

cydiachen commented 9 months ago

Thanks a lot. I will try your solution. I checked the Qwen1.5-1.5B without chat version. It is the same with you. I will try it. I will keep update the result to you.

cydiachen commented 9 months ago

```shell

RuntimeError: The size of tensor a (654) must match the size of tensor b (1307) at non-singleton dimension 3
May I consult you for detailed instructions? Or could we work together to enhance the performance? I reviewed the code and suspect the reason may be the incapability of padding='right', which is not supported by qwen1.5-flash-atten. The problem demonstrated in the official implementation.

Solved. It seems that the error lies in chat version of QWen1.5. Thx a lot

PKU-YuanGroup / MoE-LLaVA

[Discussion] Implementation of Qwen1.5 for the project #39

Discussion