NVIDIA / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
BSD 3-Clause "New" or "Revised" License
8.2k stars 1.36k forks source link

scaled_upper_triang_masked_softmax_cuda: undefined symbol #1677

Open TheGravityZero opened 1 year ago

TheGravityZero commented 1 year ago

Hello, I have this issue:

  File "<frozen importlib._bootstrap>", line 583, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1043, in create_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: .../megatron/fused_kernels/build/scaled_upper_triang_masked_softmax_cuda.so: undefined symbol: _ZN3c106detail19maybe_wrap_dim_slowEllb

My configuration:

apex                     0.1
torch                    1.11.0+cu113
torchaudio               0.11.0+cu113
torchfile                0.1.0
torchvision              0.12.0+cu113
ubuntu                   18.04
gcc                      7.5.0

I will be glad for any help!

crcrpar commented 1 year ago

Have you upgraded pytorch after having installed apex? The undefined symbol that appears to be aten/c10 is often caused by pytorch being reinstalled after apex and we'd usually have to recompile apex