Finetune error: RuntimeError: FIND was unable to find an engine to execute this computation

yt7589 commented 1 year ago

I had download the latest version of VisualGLM-6B. I used the following commands to setup the development environment:

conda create -n glm python=3.9
conda activate glm
git clone https://github.com/THUDM/VisualGLM-6B.git
cd VisualGLM-6B
pip install -i https://mirrors.aliyun.com/pypi/simple/ -r requirements.txt
# edit finetune/finetune_visualglm.sh to set NUM_GPUS_PER_WORKER=2 which is the number of GPU in my server
unzip fewshot-data.zip
bash finetune/finetune_visualglm.sh

It reported errors as below:

Traceback (most recent call last):
  File "/media/zjkj/2t/yantao/VisualGLM-6B/finetune_visualglm.py", line 188, in <module>
    training_main(args, model_cls=model, forward_step_function=forward_step, create_dataset_function=create_dataset_function, collate_fn=data_collator)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/sat/training/deepspeed_training.py", line 130, in training_main
    iteration, skipped = train(model, optimizer,
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/sat/training/deepspeed_training.py", line 274, in train
    lm_loss, skipped_iter, metrics = train_step(train_data_iterator,
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/sat/training/deepspeed_training.py", line 348, in train_step
    forward_ret = forward_step(data_iterator, model, args, timers, **kwargs)
  File "/media/zjkj/2t/yantao/VisualGLM-6B/finetune_visualglm.py", line 84, in forward_step
    logits = model(input_ids=tokens, image=image, pre_image=pre_image)[0]
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1724, in forward
    loss = self.module(*inputs, **kwargs)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/sat/model/official/chatglm_model.py", line 192, in forward
    return super().forward(input_ids=input_ids, attention_mask=attention_mask, position_ids=position_ids, past_key_values=past_key_values, **kwargs)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/sat/model/base_model.py", line 144, in forward
    return self.transformer(*args, **kwargs)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/sat/model/transformer.py", line 451, in forward
    hidden_states = self.hooks['word_embedding_forward'](input_ids, output_cross_layer=output_cross_layer, **kw_args)
  File "/media/zjkj/2t/yantao/VisualGLM-6B/model/visualglm.py", line 20, in word_embedding_forward
    image_emb = self.model(**kw_args)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/zjkj/2t/yantao/VisualGLM-6B/model/blip2.py", line 65, in forward
    enc = self.vit(image)[0]
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/zjkj/2t/yantao/VisualGLM-6B/model/blip2.py", line 29, in forward
    return super().forward(input_ids=input_ids, position_ids=None, attention_mask=attention_mask, image=image)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/sat/model/base_model.py", line 144, in forward
    return self.transformer(*args, **kwargs)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/sat/model/transformer.py", line 451, in forward
    hidden_states = self.hooks['word_embedding_forward'](input_ids, output_cross_layer=output_cross_layer, **kw_args)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/sat/model/official/vit_model.py", line 55, in word_embedding_forward
    embeddings = self.proj(images)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: FIND was unable to find an engine to execute this computation

Please note that I found the version of my pytorch is 2.0. Dose VisualGLM-6B have something wrong with Pytorch 2.0?

Sleepychord commented 1 year ago

Could you try reinstall pytorch , which matches your CUDA version? I think 2.0 is okay but 13.1 supports more cuda versions.

yt7589 commented 1 year ago

@Sleepychord Thanks. Installed the right version (11.7) of CUDA solved my problem.

THUDM / VisualGLM-6B

Finetune error: RuntimeError: FIND was unable to find an engine to execute this computation #52