This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).
MIT License
384
stars
87
forks
source link
How to fix the RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasGemmEx #18
When I train on two gpus(1080TI *2), it is current.
the configuration is CUDA 11.1, pythorch 1.8.1, torchvision 0.9.1, python 3.8.3
Warning: multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback. Original ImportError was: ModuleNotFoundError("No module named 'amp_C'")
Warning: apex was installed without --cpp_ext. Falling back to Python flatten and unflatten.
Training (X / X Steps) (loss=X.X): 0%|| 0/749 [00:00<?, ?it/s]Warning: apex was installed without --cpp_ext. Falling back to Python flatten and unflatten.
Training (X / X Steps) (loss=X.X): 0%|| 0/749 [00:42<?, ?it/s]
Traceback (most recent call last):
File "train.py", line 400, in <module>
main()
File "train.py", line 397, in main
train(args, model)
File "train.py", line 226, in train
loss, logits = model(x, y)
File "/home/lirunze/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lirunze/anaconda3/lib/python3.8/site-packages/apex-0.1-py3.8.egg/apex/parallel/distributed.py", line 560, in forward
result = self.module(*inputs, **kwargs)
File "/home/lirunze/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lirunze/anaconda3/lib/python3.8/site-packages/apex-0.1-py3.8.egg/apex/amp/_initialize.py", line 196, in new_fwd
output = old_fwd(*applier(args, input_caster),
File "/home/lirunze/xh/project/git/trans-fg_-i2-t/models/modeling.py", line 305, in forward
part_logits = self.part_head(part_tokens[:, 0])
File "/home/lirunze/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lirunze/anaconda3/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 94, in forward
return F.linear(input, self.weight, self.bias)
File "/home/lirunze/anaconda3/lib/python3.8/site-packages/torch/nn/functional.py", line 1753, in linear
return torch._C._nn.linear(input, weight, bias)
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)`
Could you analyze the problem about this? Thank you!
Thanks for your work and sharing your codes!
When I train on two gpus(1080TI *2), it is current. the configuration is CUDA 11.1, pythorch 1.8.1, torchvision 0.9.1, python 3.8.3
Could you analyze the problem about this? Thank you!