Closed julianfaraone closed 1 year ago
Which platform are you running this on? If you skipped "mutli_head_attention", "linear", "fc" and "subsampling" it would not invoke this library at all.
Can you include the stacktrace?
Thanks for your response.
I'm using a "Tesla V100 SXM2 16GB" GPU and an "Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz" CPU. I have run the code on CNNs from torchvision on the same system and haven't had issues there, only when I then moved to this Conformer model.
The error occurs at line 61 of "FP8_Emulation_Toolkit/mpemu/pytquant/cuda/fpemu.py", i.e. inside this function:
outputs = fpemu_cuda.forward(input.contiguous(), mode, size, inplace, scale, blocknorm, blocksize)
.,
which then calls the cuda code. Do you need a further trace from this inside the cuda?
To reproduce the error - all I have done is instantiated the emulator in main.py of the respository I sent through, after the model is created. Like so:
list_exempt_layers = []
list_layers_output_fused = []
model, emulator = mpt_emu.quantize_model(model,dtype='e4m3'.lower(), hw_patch='none', list_exempt_layers=list_exempt_layers, list_layers_output_fused=list_layers_output_fused, device=device, verbose=True)
Further trace down to CUDA level would be useful. Also, which python module/layer is the origin for this error?
Still working on getting the trace, i'm not entirely familiar... The origin of the error is in "encoder.subsampling_module.layers.0.0"
Are you sure you are running this on CUDA ? (check device input to the script) The CPU you listed is old and not supported by this code -- if the code is attempted to run on this CPU, it will produce "illegal instruction" error.
On GPU, often times its insufficient memory issue -- if you are running calibration the code will allocate additional tensors to maintain internal state. Try and reducing the batch size to rule out this problem.
no activity on this issue, closing.
Hi I have been able to run this code with CNNs but when using transformer models, I am having some issues. For example, running it with this conformer model https://colab.research.google.com/github/burchim/EfficientConformer/blob/master/EfficientConformer.ipynb#scrollTo=0HTj66OxQ4in gives me an error "Illegal instruction (core dumped)".
When I remove the mutli_head_attention layers and also the "linear", "fc" and "subsampling" then the error is not there anymore. Additionally, when these layers are removed but I use the configuration "e5m2" rather than "e4m3" or "e3m4", i also get this error.
Any ideas, why I am getting this error when running these layers and this fp8 configuration?