Deci-AI / super-gradients

Easily train or fine-tune SOTA computer vision models with one open source training library. The home of Yolo-NAS.
https://www.supergradients.com
Apache License 2.0
4.54k stars 496 forks source link

Exporting the quantized model to onnx gives an error, TypeError: _ConvTransposeNd._output_padding() missing 1 required positional argument: 'num_spatial_dims' #1045

Closed chandan-labelfuse closed 1 year ago

chandan-labelfuse commented 1 year ago

🐛 Describe the bug

The yolo-nas model is quantized using the documentation given at https://docs.deci.ai/super-gradients/documentation/source/ptq_qat.html#post-training-quantization

model = models.get("yolo_nas_m", pretrained_weights="coco").cuda()
model = model.eval()

q_util = SelectiveQuantizer(
    default_quant_modules_calibrator_weights="max",
    default_quant_modules_calibrator_inputs="histogram",
    default_per_channel_quant_weights=True,
    default_learn_amax=False,
    verbose=True,

)

q_util.quantize_module(model)

After the quantization, the quantized model is converted into onnx format using

dummy_input = torch.randn([1, 3, 640, 640], device="cpu")
export_quantized_module_to_onnx(
    model=model.cpu(),
    onnx_filename=onnx_filename,
    input_shape=[1, 3, 640, 640],
    input_size=[1, 3, 640, 640],
    train=False,
)

However, this gives an error,

TypeError: _ConvTransposeNd._output_padding() missing 1 required positional argument: 'num_spatial_dims'

Seems like it is an issue with the underlying tracing library of pytorch. I am not sure how to correct this or patch it.

Versions

Collecting environment information... PyTorch version: 1.13.1+cu117 Is debug build: False CUDA used to build PyTorch: 11.7 ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.2 LTS (x86_64) GCC version: (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0 Clang version: Could not collect CMake version: version 3.26.3 Libc version: glibc-2.35

Python version: 3.10.6 (main, Mar 10 2023, 10:55:28) [GCC 11.3.0] (64-bit runtime) Python platform: Linux-5.15.0-1031-aws-x86_64-with-glibc2.35 Is CUDA available: True CUDA runtime version: Could not collect CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: Tesla V100-SXM2-16GB Nvidia driver version: 530.30.02 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz CPU family: 6 Model: 79 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 Stepping: 1 CPU max MHz: 3000.0000 CPU min MHz: 1200.0000 BogoMIPS: 4600.02 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx xsaveopt Hypervisor vendor: Xen Virtualization type: full L1d cache: 128 KiB (4 instances) L1i cache: 128 KiB (4 instances) L2 cache: 1 MiB (4 instances) L3 cache: 45 MiB (1 instance) NUMA node(s): 1 NUMA node0 CPU(s): 0-7 Vulnerability Itlb multihit: KVM: Mitigation: VMX unsupported Vulnerability L1tf: Mitigation; PTE Inversion Vulnerability Mds: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown Vulnerability Meltdown: Mitigation; PTI Vulnerability Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown Vulnerability Retbleed: Not affected Vulnerability Spec store bypass: Vulnerable Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Retpolines, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown

Versions of relevant libraries: [pip3] numpy==1.23.0 [pip3] pytorch-quantization==2.1.2 [pip3] torch==1.13.1 [pip3] torchmetrics==0.8.0 [pip3] torchvision==0.14.1 [pip3] triton==2.0.0 [pip3] tritonclient==2.33.0 [conda] Could not collect

dagshub[bot] commented 1 year ago

Join the discussion on DagsHub!

spsancti commented 1 year ago

Hi! Can you please attach the full stack trace, so we can diagnose it better?

chandan-labelfuse commented 1 year ago

Hi,

Thank you for your reply. I have attached the full stack trace below. Let me know if you need anything else.

/usr/local/lib/python3.9/site-packages/pytorch_quantization/nn/modules/tensor_quanti
zer.py:284: TracerWarning: Converting a tensor to a Python boolean might cause the t
race to be incorrect. We can't record the data flow of Python values, so this value
will be treated as a constant in the future. This means that the trace might not gen
eralize to other inputs!
  if amax.numel() == 1:
/usr/local/lib/python3.9/site-packages/pytorch_quantization/nn/modules/tensor_quanti
zer.py:286: TracerWarning: Converting a tensor to a Python number might cause the tr
ace to be incorrect. We can't record the data flow of Python values, so this value w
ill be treated as a constant in the future. This means that the trace might not gene
ralize to other inputs!
  inputs, amax.item() / bound, 0,
/usr/local/lib/python3.9/site-packages/pytorch_quantization/utils/reduce_amax.py:61:
TracerWarning: Converting a tensor to a Python boolean might cause the trace to be
incorrect. We can't record the data flow of Python values, so this value will be tre
ated as a constant in the future. This means that the trace might not generalize to
other inputs!
  if not keepdims or output.numel() == 1:
/usr/local/lib/python3.9/site-packages/pytorch_quantization/nn/modules/tensor_quanti
zer.py:292: TracerWarning: Converting a tensor to a Python boolean might cause the t
race to be incorrect. We can't record the data flow of Python values, so this value
will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  quant_dim = list(amax.shape).index(list(amax_sequeeze.shape)[0])
Traceback (most recent call last):
  File "/code/app/quantization/yolo_nas_quantization.py", line 29, in <module>
    export_quantized_module_to_onnx(
  File "/usr/local/lib/python3.9/site-packages/super_gradients/training/utils/quantization/export.py", line 53, in export_quantized_module_to_onnx
    torch.onnx.export(export_model, dummy_input, onnx_filename, verbose=False, opset_version=13, do_constant_folding=True, training=training_mode)
  File "/usr/local/lib/python3.9/site-packages/torch/onnx/utils.py", line 504, in export
    _export(
  File "/usr/local/lib/python3.9/site-packages/torch/onnx/utils.py", line 1529, in _export
    graph, params_dict, torch_out = _model_to_graph(
  File "/usr/local/lib/python3.9/site-packages/torch/onnx/utils.py", line 1111, in _model_to_graph
    graph, params, torch_out, module = _create_jit_graph(model, args)
  File "/usr/local/lib/python3.9/site-packages/torch/onnx/utils.py", line 987, in _create_jit_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
  File "/usr/local/lib/python3.9/site-packages/torch/onnx/utils.py", line 891, in _trace_and_get_graph_from_model
    trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
  File "/usr/local/lib/python3.9/site-packages/torch/jit/_trace.py", line 1184, in _get_trace_graph
    outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/torch/jit/_trace.py", line 127, in forward
    graph, out = torch._C._create_graph_by_tracing(
File "/usr/local/lib/python3.9/site-packages/torch/jit/_trace.py", line 118, in wrapper
    outs.append(self.inner(*trace_inputs))
  File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1182, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/super_gradients/training/models/detection_models/customizable_detector.py", line 85, in forward
    x = self.neck(x)
  File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1182, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/super_gradients/training/models/detection_models/yolo_nas/panneck.py", line 59, in forward
    x_n1_inter, x = self.neck1([c5, c4, c3])
  File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1182, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/super_gradients/training/models/detection_models/yolo_nas/yolo_stages.py", line 269, in forward
    x = self.upsample(x_inter)
  File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1182, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/pytorch_quantization/nn/modules/quant_conv.py", line 342, in forward
    output_padding = self._output_padding(input, output_size, self.stride, self.padding, self.kernel_size)
TypeError: _output_padding() missing 1 required positional argument: 'num_spatial_dims'
haritsahm commented 1 year ago

It has something to do with torch/pytorch-quantization internal libraries as mentioned in #issues/2964. I also had this problem but i got it working by using torch==1.11.0, I have to downgrade torch until it works. Hopefully they'll can solve it in the new release soon

chandan-labelfuse commented 1 year ago

@haritsahm Downgrading the torch version worked. Closing the issue for now, thank you.

thangngoc89 commented 1 year ago

Please keep it open. I just came here to say I was hit by this bug while trying to qat YOLO-NAS-S

BloodAxe commented 1 year ago

Could be related to ONNX opset version that is set in pytorch when exporting ONNX. That could explain why downgrading torch helps.

BloodAxe commented 1 year ago

In short, this is not a bug on SG side. For now, you can downgrade torch to 1.11, but as a long-term solution we will be making a PR to pytorch_quantization to support newer versions of pytorch.

BloodAxe commented 1 year ago

We have merged a fix to master to patch pytorch_quantization on the fly, so in 3.2 release you you be able to use it without issue.

BloodAxe commented 1 year ago

We just released https://github.com/Deci-AI/super-gradients/releases/tag/3.2.0 and you are more than welcome to try out the new export API we made for object detection models: https://github.com/Deci-AI/super-gradients/blob/master/documentation/source/models_export.md

Shalom-P commented 1 year ago

Please keep it open. I just came here to say I was hit by this bug while trying to qat YOLO-NAS-S

Hi I was facing the same issue as when i was trying to QAT yolo_nas_s with custom dataset with yolo format and i fixed it with this..... uninstall all your virtual environment libraries and install the requirements.txt after u do that pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113 &> /dev/null pip install pytorch-quantization==2.1.2 --extra-index-url https://pypi.ngc.nvidia.com &> /dev/null

this should solve your problems.

danieellee commented 4 months ago

Hello, I was facing the same issue and saw that downgrading the PyTorch version to 1.11.0 resolves the issue. However, due to the need to maintain a higher version in my case, this solution was not feasible for me. Instead, I was able to resolve the problem using two different methods:

  1. Directly modifying the 'pytorch_quantization' module: Modify the line in 'pytorch_quantization/nn/modules/quant_conv.py', line 341:

    output_padding = self._output_padding(input, output_size, self.stride, self.padding, self.kernel_size)

    to

    output_padding = self._output_padding(input, output_size, self.stride, self.padding, self.kernel_size, input.dim() - 2)
  2. Patch within the ONNX conversion script: Add the following part to patch the '_output_padding' method

    
    from pytorch_quantization.nn.modules.quant_conv import QuantConvTranspose2d

original_output_padding = QuantConvTranspose2d._output_padding

def patched_output_padding(self, input, output_size, stride, padding, kernel_size): num_spatial_dims = len(kernel_size) return original_output_padding(self, input, output_size, stride, padding, kernel_size, num_spatial_dims)

QuantConvTranspose2d._output_padding = patched_output_padding

And then load the model

model = models.get("yolo_nas_m", pretrained_weights="coco").cuda() model = model.eval()

q_util = SelectiveQuantizer( default_quant_modules_calibrator_weights="max", default_quant_modules_calibrator_inputs="histogram", default_per_channel_quant_weights=True, default_learn_amax=False, verbose=True, ) q_util.quantize_module(model) ...


These solutions worked well with versions 1.12.0 and 2.3.0. My pytorch_quantization version is 2.1.0.