Error in FusedLayerNorm

Hyperparticle commented 5 years ago

After installing apex with the cuda extensions and running pytorch-pretrained-BERT, I get the following error in FusedLayerNormAffineFunction, apex/normalization/fused_layer_norm.py (line 21).

RuntimeError: a Tensor with 2482176 elements cannot be converted to Scalar (item at /pytorch/aten/src/ATen/native/Scalar.cpp:9)

Here are the shapes of my tensors:

input_ - [32, 101, 768]
bias_ - [768]
weight_ - [768]
self.normalized_shape - [768]

I'm not sure if it's a problem with pytorch-pretrained-BERT calling it incorrectly or a bug in apex. Any idea? I've also created an issue here.

I'm running Ubuntu with CUDA 9, PyTorch 0.4.1.

Full stacktrace below.

File "/home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/pytorch_pretrained_bert/modeling.py", line 710, in forward
    embedding_output = self.embeddings(input_ids, token_type_ids)
  File "/home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/pytorch_pretrained_bert/modeling.py", line 261, in forward
    embeddings = self.LayerNorm(embeddings)
  File "/home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/normalization/fused_layer_norm.py", line 149, in forward
    input, self.weight, self.bias)
  File "/home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/normalization/fused_layer_norm.py", line 21, in forward
    input_, self.normalized_shape, weight_, bias_, self.eps)

RuntimeError: a Tensor with 2482176 elements cannot be converted to Scalar (item at /pytorch/aten/src/ATen/native/Scalar.cpp:9)

frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f1aa5da3021 in /home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f1aa5da28ea in /home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #2: at::native::item(at::Tensor const&) + 0x12c3 (0x7f1aa690d5b3 in /home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #3: at::TypeDefault::item(at::Tensor const&) const + 0x55 (0x7f1aa6b1c905 in /home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #4: torch::autograd::VariableType::eye_out(at::Tensor&, long, long) const + 0x184 (0x7f1aa4faeec4 in /home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/torch/lib/libtorch.so.1)
frame #5: <unknown function> + 0x89ca (0x7f1a82e739ca in /home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #6: layer_norm_affine(at::Tensor, c10::ArrayRef<long>, at::Tensor, at::Tensor, double) + 0x185 (0x7f1a82e762a5 in /home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #7: <unknown function> + 0x18d44 (0x7f1a82e83d44 in /home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #8: <unknown function> + 0x16495 (0x7f1a82e81495 in /home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #9: _PyCFunction_FastCallDict + 0x154 (0x55a8f9925744 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #10: <unknown function> + 0x198610 (0x55a8f99ac610 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #11: _PyEval_EvalFrameDefault + 0x30a (0x55a8f99d138a in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #12: <unknown function> + 0x71e1 (0x7f1af51ee1e1 in /home/hyper/.PyCharm2018.1/system/cythonExtensions/_pydevd_frame_eval_ext/pydevd_frame_evaluator.cpython-36m-x86_64-linux-gnu.so)
frame #13: _PyFunction_FastCallDict + 0x11b (0x55a8f99a6bab in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #14: _PyObject_FastCallDict + 0x26f (0x55a8f9925b0f in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #15: _PyObject_Call_Prepend + 0x63 (0x55a8f992a6a3 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #16: PyObject_Call + 0x3e (0x55a8f992554e in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #17: THPFunction_do_forward(THPFunction*, _object*) + 0x15c (0x7f1ae02e21ec in /home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #18: PyCFunction_Call + 0x5f (0x55a8f992863f in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #19: PyObject_Call + 0x3e (0x55a8f992554e in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #20: <unknown function> + 0x16ba91 (0x55a8f997fa91 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #21: _PyObject_FastCallDict + 0x8b (0x55a8f992592b in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #22: <unknown function> + 0x19857e (0x55a8f99ac57e in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #23: _PyEval_EvalFrameDefault + 0x30a (0x55a8f99d138a in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #24: <unknown function> + 0x71e1 (0x7f1af51ee1e1 in /home/hyper/.PyCharm2018.1/system/cythonExtensions/_pydevd_frame_eval_ext/pydevd_frame_evaluator.cpython-36m-x86_64-linux-gnu.so)
frame #25: _PyFunction_FastCallDict + 0x11b (0x55a8f99a6bab in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #26: _PyObject_FastCallDict + 0x26f (0x55a8f9925b0f in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #27: _PyObject_Call_Prepend + 0x63 (0x55a8f992a6a3 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #28: PyObject_Call + 0x3e (0x55a8f992554e in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #29: _PyEval_EvalFrameDefault + 0x19ec (0x55a8f99d2a6c in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #30: <unknown function> + 0x71e1 (0x7f1af51ee1e1 in /home/hyper/.PyCharm2018.1/system/cythonExtensions/_pydevd_frame_eval_ext/pydevd_frame_evaluator.cpython-36m-x86_64-linux-gnu.so)
frame #31: <unknown function> + 0x1918e4 (0x55a8f99a58e4 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #32: _PyFunction_FastCallDict + 0x1bc (0x55a8f99a6c4c in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #33: _PyObject_FastCallDict + 0x26f (0x55a8f9925b0f in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #34: _PyObject_Call_Prepend + 0x63 (0x55a8f992a6a3 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #35: PyObject_Call + 0x3e (0x55a8f992554e in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #36: <unknown function> + 0x16ba91 (0x55a8f997fa91 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #37: _PyObject_FastCallDict + 0x8b (0x55a8f992592b in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #38: <unknown function> + 0x19857e (0x55a8f99ac57e in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #39: _PyEval_EvalFrameDefault + 0x30a (0x55a8f99d138a in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #40: <unknown function> + 0x71e1 (0x7f1af51ee1e1 in /home/hyper/.PyCharm2018.1/system/cythonExtensions/_pydevd_frame_eval_ext/pydevd_frame_evaluator.cpython-36m-x86_64-linux-gnu.so)
frame #41: <unknown function> + 0x1918e4 (0x55a8f99a58e4 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #42: _PyFunction_FastCallDict + 0x3da (0x55a8f99a6e6a in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #43: _PyObject_FastCallDict + 0x26f (0x55a8f9925b0f in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #44: _PyObject_Call_Prepend + 0x63 (0x55a8f992a6a3 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #45: PyObject_Call + 0x3e (0x55a8f992554e in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #46: _PyEval_EvalFrameDefault + 0x19ec (0x55a8f99d2a6c in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #47: <unknown function> + 0x71e1 (0x7f1af51ee1e1 in /home/hyper/.PyCharm2018.1/system/cythonExtensions/_pydevd_frame_eval_ext/pydevd_frame_evaluator.cpython-36m-x86_64-linux-gnu.so)
frame #48: <unknown function> + 0x1918e4 (0x55a8f99a58e4 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #49: _PyFunction_FastCallDict + 0x1bc (0x55a8f99a6c4c in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #50: _PyObject_FastCallDict + 0x26f (0x55a8f9925b0f in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #51: _PyObject_Call_Prepend + 0x63 (0x55a8f992a6a3 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #52: PyObject_Call + 0x3e (0x55a8f992554e in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #53: <unknown function> + 0x16ba91 (0x55a8f997fa91 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #54: _PyObject_FastCallDict + 0x8b (0x55a8f992592b in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #55: <unknown function> + 0x19857e (0x55a8f99ac57e in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #56: _PyEval_EvalFrameDefault + 0x30a (0x55a8f99d138a in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #57: <unknown function> + 0x71e1 (0x7f1af51ee1e1 in /home/hyper/.PyCharm2018.1/system/cythonExtensions/_pydevd_frame_eval_ext/pydevd_frame_evaluator.cpython-36m-x86_64-linux-gnu.so)
frame #58: <unknown function> + 0x1918e4 (0x55a8f99a58e4 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #59: _PyFunction_FastCallDict + 0x3da (0x55a8f99a6e6a in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #60: _PyObject_FastCallDict + 0x26f (0x55a8f9925b0f in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #61: _PyObject_Call_Prepend + 0x63 (0x55a8f992a6a3 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #62: PyObject_Call + 0x3e (0x55a8f992554e in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #63: _PyEval_EvalFrameDefault + 0x19ec (0x55a8f99d2a6c in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)

Hyperparticle commented 5 years ago

Upgraded to CUDA 10.0 and PyTorch 1.0.1, now I get a segmentation fault with Apex enabled.

thomwolf commented 5 years ago

I also have this error (not on pytorch-bert). Same setup (CUDA 10 and latest PyTorch 1.0.1).

geniki commented 5 years ago

Me too - PyTorch 1.0.1, CUDA 10. It's not specific to pytorch-pretrained-BERT, the script below is enough for me:

import torch
import apex

input = torch.rand(3, 10).cuda()
fln = apex.normalization.FusedLayerNorm(10).cuda()
fln(input)

thomwolf commented 5 years ago

I got this example to fail on a V100 too. I've now also tested on a k80 and this example works well with CUDA 10 and pytorch 1.0.1.post2 🤔

Hyperparticle commented 5 years ago

@geniki @thomwolf Strange, I don't get any errors with the script above, but I still get the runtime error when running pytorch-pretrained-BERT (using Titan RTX).

mrdbourke commented 5 years ago

@geniki

Me too - PyTorch 1.0.1, CUDA 10. It's not specific to pytorch-pretrained-BERT, the script below is enough for me:
import torch
import apex

input = torch.rand(3, 10).cuda()
fln = apex.normalization.FusedLayerNorm(10).cuda()
fln(input)

When I run this^ I get:

ModuleNotFoundError: No module named 'fused_layer_norm_cuda'

Also getting it on a pytorch-pretrained-BERT experiment.

Not sure if these issues (mine and the one originally posted) are related though...

thorjohnsen commented 5 years ago

@mrdbourke I think you may have compiled apex without cuda support. You need to compile it with python setup.py install --cpp_ext --cuda_ext.

mrdbourke commented 5 years ago

@mrdbourke I think you may have compiled apex without cuda support. You need to compile it with python setup.py install --cpp_ext --cuda_ext.

Thank you, just realised I didn't use the extension... my bad.

This fixed it.

thomwolf commented 5 years ago

@mcarilli any hint on a possible source of error from you guys?

mcarilli commented 5 years ago

Sorry for the delayed response, my bandwidth right now is completely consumed cleaning up the mixed precision API (https://github.com/NVIDIA/apex/compare/api_refactor?expand=1).** I didn't write FusedLayerNorm (it came in from our MLPerf efforts) and I haven't had time to debug it. @thorjohnsen is currently using it in our own implementation of BERT.

@geniki Thank you for the minimal repro. @Hyperparticle @thomwolf When you say "I get a segmentation fault with Apex enabled in https://github.com/NVIDIA/apex/issues/156#issuecomment-464115433, do you mean the segmentation fault occurs specifically when you try to use FusedLayerNorm, or at some other point?

**Unrelated, but useful: I'll be presenting a preview of the new API in a webinar tomorrow. It's working, but I don't have documentation or examples yet. I will add it to master by next week.

Hyperparticle commented 5 years ago

@mcarilli I can confirm that I do get the segmentation fault when calling the FusedLayerNorm code, but I haven't investigated exactly where. I don't get one when I use regular LayerNorm.

mcarilli commented 5 years ago

@Hyperparticle @thomwolf @geniki While I wait for the results of Thor's runs, one thing that occurs to me is your segfault may be because when you upgraded Pytorch, the existing (installed) Apex binaries were no longer compatible somehow. Try a full pip uninstall apex, then cd apex_repo_dir; rm-rf build; pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" . and see if the segfault persists.

Hyperparticle commented 5 years ago

@mcarilli Thanks, that fixed the segfault. But now I still get the same FusedLayerNorm error.

jjsjann123 commented 5 years ago

@geniki The mini repro runs fine on my setup (cuda 10 with v100)

@Hyperparticle Can you provide some more information on how to repro this issue? which pretrained model are you using?

A script (if possible) with the repro would be of great help.

geniki commented 5 years ago

Thanks @mcarilli. This fixed it for me - at least the snipped I posted above. @Hyperparticle does the snipped above run for you?

Hyperparticle commented 5 years ago

@geniki @jjsjann123 The snippet works, but I'm still seeing an error for my use-case. I'm running the tutorial code from this section in pytorch-pretrained-BERT with apex enabled. I'll try to debug it and get a minimal code snippet extracted with the tensor operation.

jjsjann123 commented 5 years ago

Thanks a lot. We are having a hard time reproducing the bug. Having a repro script would make it much faster for us to debug the problem. Looking forward to your update.

Hyperparticle commented 5 years ago

@jjsjann123 This is basically what the code is doing:

import torch
import apex
import importlib

fused_layer_norm_cuda = importlib.import_module("fused_layer_norm_cuda")

input_ = torch.rand([32, 63, 768]).cuda()
weight_ = torch.rand(768).cuda()
bias_ = torch.rand(768).cuda()
normalized_shape = weight_.size()
eps = 1e-12

output, mean, invvar = fused_layer_norm_cuda.forward_affine(input_, normalized_shape, weight_, bias_, eps)

My GPU is now unavailable, so I can't verify if this causes the problem. If not, then it could either be the values in the tensors that are the problem (which I will have to save and upload somewhere), or some other extraneous property of the tensors.

jjsjann123 commented 5 years ago

root@d0c3981dfbe3:/workspace# cat repro.py 
import torch
import apex
import importlib

fused_layer_norm_cuda = importlib.import_module("fused_layer_norm_cuda")

input_ = torch.rand([32, 63, 768]).cuda()
weight_ = torch.rand(768).cuda()
bias_ = torch.rand(768).cuda()
normalized_shape = weight_.size()
eps = 1e-12

output, mean, invvar = fused_layer_norm_cuda.forward_affine(input_, normalized_shape, weight_, bias_, eps)
torch.cuda.synchronize()
root@d0c3981dfbe3:/workspace# python repro.py 
root@d0c3981dfbe3:/workspace#

This is working fine for me as well :(

thomwolf commented 5 years ago

Seems like recompiling apex cleanly like @mcarilli indicated fixed the problem for me also! Both @geniki and @Hyperparticle examples works at my place (as well as my current project). Thanks a lot!

Hyperparticle commented 5 years ago

@thomwolf Well that sounds like a relief. As for me, I'll have to see if the old code is still lingering somewhere on my system. I'll have to test it in a couple days. @jjsjann123 If it works for others, then you can close this issue.

jjsjann123 commented 5 years ago

I'll close the issue and feel free to open a new one and ping me on that if things don't work out for you @Hyperparticle

mcarilli commented 5 years ago

Whew, this is a useful gotcha to know about. good old emergency repair procedure number one: turn it off and on again. Glad people seem to be happy, especially since as I said, I don't have the bandwidth to do a deep dive debug right this second.

Note to self: make the setup.py smarter to avoid such cases in the future.

Strideradu commented 5 years ago

@mrdbourke I think you may have compiled apex without cuda support. You need to compile it with python setup.py install --cpp_ext --cuda_ext.

I cannot use pip to install apex, your method works for me

wyx518 commented 5 years ago

pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .

i do like this but also get the segment fault

ptrblck commented 5 years ago

@wyx518 Do you get the seg fault while running some python script using apex/amp or during the install? Either way, could you post the complete error message with the stack trace so that we can have a look?

wyx518 commented 5 years ago

@ptrblck First，I run my own demo using pytorch-pretrained-BERT but get this run.sh: line 3: 21713 Segmentation fault (core dumped) . Then I run the code @jjsjann123 offered, also get the

wyx518 commented 5 years ago

I solved the problem, it's the version of GCC . It should be 4.9+,but ubuntu 14.04 is 4.8.

lbys commented 5 years ago

@mcarilli Could you please tell me how to find "apex_repo_dir" and then "cd apex_repo_dir",I find it all the time ,but cannot figure it out ,thanks

widgetxp commented 4 years ago

Upgraded to CUDA 10.0 and PyTorch 1.0.1, now I get a segmentation fault with Apex enabled.

I also get a segmentation fault with Apex enabled, CUDA 9,0 and PyTorch 1.1.0

ethanjperez commented 4 years ago

Running fp16 models via fairseq and getting a segmentation fault with pytorch 1.4.0, gcc/6.3.0, cuda/10.1.105

DanyalAndriano commented 4 years ago

Is there a way to install apex on a windows machine with "--cpp_ext" and "--cuda_ext"? At the moment I can't, and as far as I can tell that's a general issue with windows?

NVIDIA / apex

Error in FusedLayerNorm #156