THUDM / VisualGLM-6B

Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型
Apache License 2.0
4.07k stars 414 forks source link

修改图片为128的代码出错! #296

Closed chenjingcheng closed 10 months ago

chenjingcheng commented 10 months ago

您好,按照您的代码,我测试了,发现报错,log如下: Traceback (most recent call last): File "/home/aifont/disk4T/aiproject/visualGLM-6B/finetune_visualglm.py", line 184, in new_prop = ViTProperty(new_image_size, old_prop.patch_size, old_prop.pre_len, old_prop.post_len) File "/home/aifont/anaconda3/envs/vglm/lib/python3.10/site-packages/sat/model/official/vit_model.py", line 30, in init assert isinstance(image_size, Iterable) and len(image_size) == 2 AssertionError 谢谢!

可以试一下在得到model from_pretrained以后加上:

new_image_size = 128
from sat.model.official.vit_model import ViTProperty
old_prop = model.get_mixin("eva").model.vit.transformer.property
new_prop = ViTProperty(new_image_size, old_prop.patch_size, old_prop.pre_len, old_prop.post_len)
model.get_mixin("eva").model.vit.get_mixin("pos_embedding").reinit(property=new_prop)
args.eva_args["image_size"] = new_image_size

Originally posted by @1049451037 in https://github.com/THUDM/VisualGLM-6B/issues/291#issuecomment-1767494875

chenjingcheng commented 10 months ago

我修改代码如下: model_type = './thudm/visualglm-6b' model, args = FineTuneVisualGLMModel.from_pretrained(model_type, args)

修改图片为128

# ---------------------
new_image_size = 128
from sat.model.official.vit_model import ViTProperty
old_prop = model.get_mixin("eva").model.vit.transformer.property
new_prop = ViTProperty(new_image_size, old_prop.patch_size, old_prop.pre_len, old_prop.post_len)
model.get_mixin("eva").model.vit.get_mixin("pos_embedding").reinit(property=new_prop)
args.eva_args["image_size"] = new_image_size
# -------------------------
1049451037 commented 10 months ago

new_image_size = [128, 128]

chenjingcheng commented 10 months ago

收到,谢谢

chenjingcheng commented 10 months ago

Traceback (most recent call last): File "/home/aifont/disk4T/aiproject/visualGLM-6B/finetune_visualglm.py", line 185, in model.get_mixin("eva").model.vit.get_mixin("pos_embedding").reinit(property=new_prop) File "/home/aifont/anaconda3/envs/vglm/lib/python3.10/site-packages/sat/model/official/vit_model.py", line 85, in reinit image_weight = F.interpolate(image_weight, size=property.grid_size, mode='bicubic', align_corners=False).permute(0, 2, 3, 1).reshape(property.num_patches, -1) File "/home/aifont/anaconda3/envs/vglm/lib/python3.10/site-packages/torch/nn/functional.py", line 4028, in interpolate return torch._C._nn.upsample_bicubic2d(input, output_size, align_corners, scale_factors) RuntimeError: "compute_indices_weights_cubic" not implemented for 'Half' [2023-10-24 10:51:39,522] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 7777 [2023-10-24 10:51:39,523] [ERROR] [launch.py:321:sigkill_handler] ['/home/aifont/anaconda3/envs/vglm/bin/python', '-u', 'finetune_visualglm.py', '--local_rank=0', '--experiment-name', 'finetune-visualglm-6b', '--model-parallel-size', '1', '--mode', 'finetune', '--train-iters', '400', '--resume-dataloader', '--max_source_length', '64', '--max_target_length', '256', '--lora_rank', '10', '--layer_range', '0', '14', '--pre_seq_len', '4', '--train-data', './dataset.json', '--valid-data', './dataset.json', '--distributed-backend', 'nccl', '--lr-decay-style', 'cosine', '--warmup', '.02', '--checkpoint-activations', '--save-interval', '300', '--eval-interval', '10000', '--save', './checkpoints', '--split', '1', '--eval-iters', '10', '--eval-batch-size', '8', '--zero-stage', '1', '--lr', '0.0001', '--batch-size', '4', '--skip-init', '--fp16', '--use_lora'] exits with return code = 1

我的数据来自不同的地方:一部分是自动下载的(执行demo),一部分是通过清华网盘下载的。还有个mp_rank_00_model_states.pt,是从xray下载的,是对应300的目录。报错latest,我看到网上有类似问题去哪里下载。 现在出现这个bug。 非常感谢!

1049451037 commented 10 months ago

不太清楚你的需求是什么……从这么多地方下载了不同的东西,我也不太清楚该怎么用起来……

chenjingcheng commented 10 months ago

您好,我是行业数据,图片128就ok了。 至于数据来源我说下,按照你的readme操作的。 1、执行demo的代码,有部分是自动下载的,内容如下。 ~/.sat_models/ cogvlm-chat/ cogvlm-chat.zip cogvlm-chat.zip.9Ca67A9E visualglm-6b/ visualglm-6b.zip cogvlm-chat.lock cogvlm-chat.zip.0e4eab3E cogvlm-grounding-generalist.lock visualglm-6b.lock
2、我要finetune我自己的数据,需要模型数据,是在清华云(readme提供的)下在的bin: ls thudm/visualglm-6b/ 300 ice_text.model modeling_chatglm.py pytorch_model-00003-of-00005.bin pytorch_model.bin.index.json tokenization_chatglm.py config.json latest pytorch_model-00001-of-00005.bin pytorch_model-00004-of-00005.bin quantization.py tokenizer_config.json configuration_chatglm.py model_config.json pytorch_model-00002-of-00005.bin pytorch_model-00005-of-00005.bin README.md visual.py 非bin文件在huggingface下载的。 3、在finetune过程中找不到latest,我在网上搜到要去xray下载300对应的文件:mp_rank_00_model_states.pt 4、好像还有config文件,也是别的地方拷贝。 目前报错的是128x128图片修改部分 增加的 代码: model.get_mixin("eva").model.vit.get_mixin("pos_embedding").reinit(property=new_prop)

非常感谢您的支持!

1049451037 commented 10 months ago

因为不同地方的模型是无法乱拼的。用不起来是肯定的。

1049451037 commented 10 months ago

不同的模型格式都不一样,拼起来文件名对了也用不起来。

chenjingcheng commented 10 months ago

谢谢您的回复。我按照你的readme下载的文件,并且直接用你们代码,为什么要用latest文件,去找到那个呢?是不是finetune.sh的参数要修改?谢谢

chenjingcheng commented 10 months ago

清华云的下载也仅仅有bin,所以运行报错需要相关文件我才去找的,能否有个完整的目录下载对应数据呢?谢谢

1049451037 commented 10 months ago

清华云的bin是huggingface版本,无法用来fine-tune。要finetune就只能用sat版本,sat版本只能用程序自动下载,清华云上没有。

1049451037 commented 10 months ago

至于latest文件,自动下载成功了就有了。

chenjingcheng commented 10 months ago

那行,我重新来一次,非常感谢支持!

chenjingcheng commented 10 months ago

@1049451037 您好,我从新用全新代码做了测试: 如果不添加因为128的代码是ok的,如果修改成128的代码: new_image_size = [128, 128] from sat.model.official.vit_model import ViTProperty old_prop = model.get_mixin("eva").model.vit.transformer.property new_prop = ViTProperty(new_image_size, old_prop.patch_size, old_prop.pre_len, old_prop.post_len) model.get_mixin("eva").model.vit.get_mixin("pos_embedding").reinit(property=new_prop) args.eva_args["image_size"] = new_image_size

就报错: model.get_mixin("eva").model.vit.get_mixin("pos_embedding").reinit(property=new_prop) File "/home/aifont/anaconda3/envs/vglm/lib/python3.10/site-packages/sat/model/official/vit_model.py", line 85, in reinit image_weight = F.interpolate(image_weight, size=property.grid_size, mode='bicubic', align_corners=False).permute(0, 2, 3, 1).reshape(property.num_patches, -1) File "/home/aifont/anaconda3/envs/vglm/lib/python3.10/site-packages/torch/nn/functional.py", line 4028, in interpolate return torch._C._nn.upsample_bicubic2d(input, output_size, align_corners, scale_factors) RuntimeError: "compute_indices_weights_cubic" not implemented for 'Half'

1049451037 commented 10 months ago

这个没办法,可能是你的cuda版本太低或者torch版本太低了,没有实现插值对应的算子。

尝试升级torch或者升级cuda。

chenjingcheng commented 10 months ago

环境: torch :2.1.0 cuda:11.8 gpu:4090 需要把cuda升级到12以上?谢谢!

chenjingcheng commented 10 months ago

我把cuda升级到 12.1仍然一样的bug,因为环境是ubuntu 18.04,无法升级到12.3. 我最近两天升级到12.3再测试看看。 非常感谢您的支持。

1049451037 commented 10 months ago
git clone https://github.com/THUDM/SwissArmyTransformer
cd SwissArmyTransformer
pip install .

试一下安装github最新版sat。刚给你解决了一下这个问题。

chenjingcheng commented 10 months ago

非常感谢,上面不再报错了,但是又如下报错: Traceback (most recent call last): File "/home/aifont/disk4T/aiproject/visualGLM-6B/finetune_visualglm.py", line 260, in training_main(args, model_cls=model, forward_step_function=forward_step, create_dataset_function=create_dataset_function, collate_fn=data_collator) File "/home/aifont/anaconda3/envs/vglm/lib/python3.10/site-packages/sat/training/deepspeed_training.py", line 130, in training_main iteration, skipped = train(model, optimizer, File "/home/aifont/anaconda3/envs/vglm/lib/python3.10/site-packages/sat/training/deepspeed_training.py", line 299, in train lm_loss, skipped_iter, metrics = train_step(train_data_iterator, File "/home/aifont/anaconda3/envs/vglm/lib/python3.10/site-packages/sat/training/deepspeed_training.py", line 373, in train_step forward_ret = forward_step(data_iterator, model, args, timers, kwargs) File "/home/aifont/disk4T/aiproject/visualGLM-6B/finetune_visualglm.py", line 87, in forward_step logits = model(input_ids=tokens, image=image, pre_image=pre_image)[0] File "/home/aifont/anaconda3/envs/vglm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/aifont/anaconda3/envs/vglm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "/home/aifont/anaconda3/envs/vglm/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(*args, kwargs) File "/home/aifont/anaconda3/envs/vglm/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1807, in forward loss = self.module(*inputs, *kwargs) File "/home/aifont/anaconda3/envs/vglm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/aifont/anaconda3/envs/vglm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "/home/aifont/anaconda3/envs/vglm/lib/python3.10/site-packages/sat/model/official/chatglm_model.py", line 190, in forward return super().forward(input_ids=input_ids, attention_mask=attention_mask, position_ids=position_ids, past_key_values=past_key_values, kwargs) File "/home/aifont/anaconda3/envs/vglm/lib/python3.10/site-packages/sat/model/base_model.py", line 137, in forward return self.transformer(args, kwargs) File "/home/aifont/anaconda3/envs/vglm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/aifont/anaconda3/envs/vglm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "/home/aifont/anaconda3/envs/vglm/lib/python3.10/site-packages/sat/model/transformer.py", line 511, in forward hidden_states = self.hooks['word_embedding_forward'](input_ids, output_cross_layer=output_cross_layer, kw_args) File "/home/aifont/disk4T/aiproject/visualGLM-6B/model/visualglm.py", line 23, in word_embedding_forward image_emb = self.model(kw_args) File "/home/aifont/anaconda3/envs/vglm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/home/aifont/anaconda3/envs/vglm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "/home/aifont/disk4T/aiproject/visualGLM-6B/model/blip2.py", line 65, in forward enc = self.vit(image)[0] File "/home/aifont/anaconda3/envs/vglm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/aifont/anaconda3/envs/vglm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, kwargs) File "/home/aifont/disk4T/aiproject/visualGLM-6B/model/blip2.py", line 29, in forward return super().forward(input_ids=input_ids, position_ids=None, attention_mask=attention_mask, image=image) File "/home/aifont/anaconda3/envs/vglm/lib/python3.10/site-packages/sat/model/base_model.py", line 137, in forward return self.transformer(*args, *kwargs) File "/home/aifont/anaconda3/envs/vglm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/aifont/anaconda3/envs/vglm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/home/aifont/anaconda3/envs/vglm/lib/python3.10/site-packages/sat/model/transformer.py", line 522, in forward hidden_states = hidden_states + position_embeddings RuntimeError: The size of tensor a (257) must match the size of tensor b (82) at non-singleton dimension 1

如果不修改128,默认代码finetune是ok的,谢谢!

1049451037 commented 10 months ago

这是你自己输入的图像的size不是128导致的。

chenjingcheng commented 10 months ago

您好,谢谢您的回复。 我查看了图片数据,肯定是128x128的。 并且我把代码做了修改强制成128x128: image = processor(Image.open(item['img']).resize((128,128)).convert('RGB')) 报错也一样。

1049451037 commented 10 months ago

输入图像的size是由这个决定的:

https://github.com/THUDM/VisualGLM-6B/blob/f4429a009ee533b76e8757dce6917fbf0b0408f9/finetune_visualglm.py#L158

chenjingcheng commented 10 months ago

输入图像的size是由这个决定的:

https://github.com/THUDM/VisualGLM-6B/blob/f4429a009ee533b76e8757dce6917fbf0b0408f9/finetune_visualglm.py#L158

修改成128后就可以了,非常感谢您耐心的指导!