THUDM / VisualGLM-6B

Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型
Apache License 2.0
4.07k stars 414 forks source link

RuntimeError: "compute_indices_weights_cubic" not implemented for 'Half' #297

Closed chenjingcheng closed 10 months ago

chenjingcheng commented 10 months ago

Traceback (most recent call last): File "/home/aifont/disk4T/aiproject/visualGLM-6B/finetune_visualglm.py", line 185, in model.get_mixin("eva").model.vit.get_mixin("pos_embedding").reinit(property=new_prop) File "/home/aifont/anaconda3/envs/vglm/lib/python3.10/site-packages/sat/model/official/vit_model.py", line 85, in reinit image_weight = F.interpolate(image_weight, size=property.grid_size, mode='bicubic', align_corners=False).permute(0, 2, 3, 1).reshape(property.num_patches, -1) File "/home/aifont/anaconda3/envs/vglm/lib/python3.10/site-packages/torch/nn/functional.py", line 4028, in interpolate return torch._C._nn.upsample_bicubic2d(input, output_size, align_corners, scale_factors) RuntimeError: "compute_indices_weights_cubic" not implemented for 'Half' [2023-10-24 10:51:39,522] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 7777 [2023-10-24 10:51:39,523] [ERROR] [launch.py:321:sigkill_handler] ['/home/aifont/anaconda3/envs/vglm/bin/python', '-u', 'finetune_visualglm.py', '--local_rank=0', '--experiment-name', 'finetune-visualglm-6b', '--model-parallel-size', '1', '--mode', 'finetune', '--train-iters', '400', '--resume-dataloader', '--max_source_length', '64', '--max_target_length', '256', '--lora_rank', '10', '--layer_range', '0', '14', '--pre_seq_len', '4', '--train-data', './dataset.json', '--valid-data', './dataset.json', '--distributed-backend', 'nccl', '--lr-decay-style', 'cosine', '--warmup', '.02', '--checkpoint-activations', '--save-interval', '300', '--eval-interval', '10000', '--save', './checkpoints', '--split', '1', '--eval-iters', '10', '--eval-batch-size', '8', '--zero-stage', '1', '--lr', '0.0001', '--batch-size', '4', '--skip-init', '--fp16', '--use_lora'] exits with return code = 1

我的数据来自不同的地方:一部分是自动下载的(执行demo),一部分是通过清华网盘下载的。还有个mp_rank_00_model_states.pt,是从xray下载的,是对应300的目录。报错latest,我看到网上有类似问题去哪里下载。 现在出现这个bug。 非常感谢!

chenjingcheng commented 10 months ago

mp_rank_00_model_states.pt 是xray项目的huggingface上下载的。