IntelLabs / FP8-Emulation-Toolkit

PyTorch extension for emulating FP8 data formats on standard FP32 Xeon/GPU hardware.
BSD 3-Clause "New" or "Revised" License
100 stars 10 forks source link

Illegal instruction (core dumped) #3

Closed julianfaraone closed 1 year ago

julianfaraone commented 1 year ago

Hi I have been able to run this code with CNNs but when using transformer models, I am having some issues. For example, running it with this conformer model https://colab.research.google.com/github/burchim/EfficientConformer/blob/master/EfficientConformer.ipynb#scrollTo=0HTj66OxQ4in gives me an error "Illegal instruction (core dumped)".

When I remove the mutli_head_attention layers and also the "linear", "fc" and "subsampling" then the error is not there anymore. Additionally, when these layers are removed but I use the configuration "e5m2" rather than "e4m3" or "e3m4", i also get this error.

Any ideas, why I am getting this error when running these layers and this fp8 configuration?

nkmellem commented 1 year ago

Which platform are you running this on? If you skipped "mutli_head_attention", "linear", "fc" and "subsampling" it would not invoke this library at all.

Can you include the stacktrace?

julianfaraone commented 1 year ago

Thanks for your response.

I'm using a "Tesla V100 SXM2 16GB" GPU and an "Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz" CPU. I have run the code on CNNs from torchvision on the same system and haven't had issues there, only when I then moved to this Conformer model.

The error occurs at line 61 of "FP8_Emulation_Toolkit/mpemu/pytquant/cuda/fpemu.py", i.e. inside this function: outputs = fpemu_cuda.forward(input.contiguous(), mode, size, inplace, scale, blocknorm, blocksize)., which then calls the cuda code. Do you need a further trace from this inside the cuda?

To reproduce the error - all I have done is instantiated the emulator in main.py of the respository I sent through, after the model is created. Like so: list_exempt_layers = [] list_layers_output_fused = [] model, emulator = mpt_emu.quantize_model(model,dtype='e4m3'.lower(), hw_patch='none', list_exempt_layers=list_exempt_layers, list_layers_output_fused=list_layers_output_fused, device=device, verbose=True)

nkmellem commented 1 year ago

Further trace down to CUDA level would be useful. Also, which python module/layer is the origin for this error?

julianfaraone commented 1 year ago

Still working on getting the trace, i'm not entirely familiar... The origin of the error is in "encoder.subsampling_module.layers.0.0"

nkmellem commented 1 year ago

Are you sure you are running this on CUDA ? (check device input to the script) The CPU you listed is old and not supported by this code -- if the code is attempted to run on this CPU, it will produce "illegal instruction" error.

On GPU, often times its insufficient memory issue -- if you are running calibration the code will allocate additional tensors to maintain internal state. Try and reducing the batch size to rule out this problem.

nkmellem commented 1 year ago

no activity on this issue, closing.