google-ai-edge / ai-edge-torch

Supporting PyTorch models with the Google AI Edge TFLite runtime.
Apache License 2.0
366 stars 51 forks source link

Failed to prepare model for QAT in Pt2e with quanconfig from ai_edge_torch #355

Open gurkirt opened 1 day ago

gurkirt commented 1 day ago

Description of the bug:

I have try following quantization configs.

example_inputs = (torch.randn(1, 1, self.args.img_height, self.args.img_width).to(self.args.device),)
        # self.model.eval()

self.model = capture_pre_autograd_graph(self.model, example_inputs)
if self.args.quantize_config_type == 'xnn':
    from torch.ao.quantization.quantizer.xnnpack_quantizer import XNNPACKQuantizer, get_symmetric_quantization_config
    self.quantizer = XNNPACKQuantizer()
    self.quantizer.set_global(get_symmetric_quantization_config(is_qat=is_qat, is_per_channel=True))
elif self.args.quantize_config_type == 'x86':
    self.quantizer = X86InductorQuantizer()
    self.quantizer.set_global(get_default_x86_inductor_quantization_config(is_qat=is_qat, is_dynamic=False))
elif self.args.quantize_config_type == 'eat':
    from ai_edge_torch.quantize.pt2e_quantizer import get_symmetric_quantization_config
    from ai_edge_torch.quantize.pt2e_quantizer import PT2EQuantizer
    self.quantizer = PT2EQuantizer().set_global(
        get_symmetric_quantization_config(is_per_channel=True, is_dynamic=False, is_qat=is_qat))

with prepare being from

self.model = prepare_qat_pt2e(self.model, self.quantizer)

However in third condition which is quant configs from ai_edge_torch, prepare_qat_pt2e fail but it works in other configs.

This is problem because I want to convert qat model to be in tflite, tflite conversion only works with quantize_config_type == 'eat' but not other

Actual vs expected behavior:

In case of prepare_qat with 'eat' config. I get following error;

self.model = prepare_qat_pt2e(self.model, self.quantizer)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xovis/miniconda3/envs/pymmlib2.4/lib/python3.11/site-packages/torch/ao/quantization/quantize_pt2e.py", line 175, in prepare_qat_pt2e
  quantizer.annotate(model)
File "/home/xovis/miniconda3/envs/pymmlib2.4/lib/python3.11/site-packages/ai_edge_torch/quantize/pt2e_quantizer.py", line 356, in annotate
  model = self._annotate_for_static_quantization_config(model)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xovis/miniconda3/envs/pymmlib2.4/lib/python3.11/site-packages/ai_edge_torch/quantize/pt2e_quantizer.py", line 404, in _annotate_for_static_quantization_config
  self._annotate_all_static_patterns(
File "/home/xovis/miniconda3/envs/pymmlib2.4/lib/python3.11/site-packages/ai_edge_torch/quantize/pt2e_quantizer.py", line 371, in _annotate_all_static_patterns
  OP_TO_ANNOTATOR[op](model, quantization_config, filter_fn)
File "/home/xovis/miniconda3/envs/pymmlib2.4/lib/python3.11/site-packages/ai_edge_torch/quantize/pt2e_quantizer_utils.py", line 433, in _annotate_conv_bn_relu
  return _do_annotate_conv_bn(gm, quantization_config, filter_fn, has_relu=True)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xovis/miniconda3/envs/pymmlib2.4/lib/python3.11/site-packages/ai_edge_torch/quantize/pt2e_quantizer_utils.py", line 489, in _do_annotate_conv_bn
  pattern = _get_aten_graph_module_for_pattern(pattern, example_inputs, is_cuda)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xovis/miniconda3/envs/pymmlib2.4/lib/python3.11/site-packages/torch/ao/quantization/pt2e/utils.py", line 317, in _get_aten_graph_module_for_pattern
  aten_pattern = capture_pre_autograd_graph(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xovis/miniconda3/envs/pymmlib2.4/lib/python3.11/site-packages/torch/_export/__init__.py", line 147, in capture_pre_autograd_graph
  assert isinstance(f, torch.nn.Module), "Expected an nn.Module instance."
AssertionError: Expected an nn.Module instance.

In case of other configs I get following error in when doing conversion.

 File "/huge_fast_workdisk/datadisk/deep3share/source/gurkirt_codepad/pymllib/lib/core/engine.py", line 505, in to_tflite
  model = ai_edge_torch.convert(self.model, example_inputs, quant_config=QuantConfig(pt2e_quantizer=self.quantizer))
                                                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xovis/miniconda3/envs/pymmlib2.4/lib/python3.11/site-packages/ai_edge_torch/quantize/quant_config.py", line 72, in __init__
  if pt2e_quantizer.global_config.is_dynamic
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'QuantizationConfig' object has no attribute 'is_dynamic'

Any other information you'd like to share?

Exact thing happend with following two enviroments;

  1. pytorch and torch_xla == 2.4 and ai_edge 0.2
  2. pytorch and torch_xla == 2.5.1 and ai_edge 0.2
pkgoogle commented 1 day ago

Hi @gurkirt, what model are you using? i.e. How do you define self.model prior to your code snippet?

gurkirt commented 15 hours ago

I use use yolo style model, its class is defined as follow

class YoloCNN(BaseCNN): ## BaseCNN is  nn.Module
    def __init__(self, args):
        super(YoloCNN, self).__init__(args)
        self.transform_perms = None
        model, save = parse_model(args) ## function on YAML config file
        self.save = save
        logger.info(f"YOLO model SAVE STATE ARE {save}")
        self.model = model ## model is instance of nn.sequential()

    def forward(self, x):
        y, dt = [], []  # outputs
        # pdb.set_trace()
        for m in self.model:
            if m.f != -1:  # if not from previous layer
                x = y[m.f] if isinstance(m.f, int) else [x if j == -1 else y[j] for j in m.f]  # from earlier layers
            # if profile:
            #     self._profile_one_layer(m, x, dt)
            x = m(x)  # run
            y.append(x if m.i in self.save else None)  # save output
            # if visualize:
            #     feature_visualization(x, m.type, m.i, save_dir=visualize)
        return x

And build Yaml contains following content

# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv8-pose-p6 keypoints/pose estimation model. For Usage examples see https://docs.ultralytics.com/tasks/pose

# Parameters
nc: 5  # number of classes
ch: 1
# kpt_shape: [17, 3]  # number of keypoints, number of dims (2 for x,y or 3 for x,y,visible)
scales: # model compound scaling constants, i.e. 'model=yolov8n-p6.yaml' will call yolov8-p6.yaml with scale 'n'
  # [depth, width, max_channels]
  n: [0.33, 0.25, 1024]
  s: [0.33, 0.50, 1024]
  m: [0.67, 0.75, 768]
  l: [1.00, 1.00, 512]
  x: [1.00, 1.25, 512]

# YOLOv8.0x6 backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, Conv, [64, 3, 2]]  # 0-P1/2
  - [-1, 1, Conv, [128, 3, 2]]  # 1-P2/4
  - [-1, 3, C2f, [128, True]]
  - [-1, 1, Conv, [256, 3, 2]]  # 3-P3/8
  - [-1, 6, C2f, [256, True]]
  - [-1, 1, Conv, [512, 3, 2]]  # 5-P4/16
  - [-1, 6, C2f, [512, True]]
  - [-1, 1, Conv, [768, 3, 2]]  # 7-P5/32
  - [-1, 3, C2f, [768, True]]
  - [-1, 1, Conv, [1024, 3, 2]]  # 9-P6/64
  - [-1, 3, C2f, [1024, True]]
  - [-1, 1, SPPF, [1024, 5]]  # 11

# YOLOv8.0x6 head
head:
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 8], 1, Concat, [1]]  # cat backbone P5
  - [-1, 3, C2, [768, False]]  # 14

  - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 6], 1, Concat, [1]]  # cat backbone P4
  - [-1, 3, C2, [512, False]]  # 17 (P4/16-small)

  - [-1, 1, Conv, [512, 3, 2]]
  - [[-1, 14], 1, Concat, [1]]  # cat head P5
  - [-1, 3, C2, [768, False]]  # 20 (P5/32-normal)

  - [-1, 1, Conv, [768, 3, 2]]
  - [[-1, 11], 1, Concat, [1]]  # cat head P6
  - [-1, 3, C2, [1024, False]]  # 23 (P6/64-large)

  - [[17, 20, 23], 1, DetectClasswise, [nc, 8]]  # Pose(P4, P5, P6)

I don't think model definition is a problem. QAT has issue only when I am use quantization config from from ai_edge_torch.quantize.pt2e_quantizer import PT2EQuantizer rather than from torch.ao.quantization.quantizer.xnnpack_quantizer import XNNPACKQuantizer

gurkirt commented 15 hours ago

Is there way to use quant config from original torch.ao rather than ai_edge_torch in convert function i.e. model = ai_edge_torch.convert(self.model, example_inputs, quant_config=QuantConfig(pt2e_quantizer=self.quantizer)) How can I extend torch.ao.quantization.quantizer.xnnpack_quantizer import XNNPACKQuantizer or X86InductorQuantizer so ai_edge_torch.convert doesn't throw AttributeError: 'QuantizationConfig' object has no attribute 'is_dynamic'?

Why do we need PT2EQuantizer in first place how can I find find work around?

gurkirt commented 15 hours ago

BTW, PT2EQuantizer works well without any issue when I use PTQ either prepare_pt2e works well. Howeverduring QAT i.e. prepare_qat_pt2e leads to AssertionError: Expected an nn.Module instance when at the time of prepare_qat_pt2e.

gurkirt commented 10 hours ago

Hi, I solved the is_dynamic error by adding is_dynamic as member variable of https://github.com/pytorch/pytorch/blob/main/torch/ao/quantization/quantizer/xnnpack_quantizer_utils.py#L51 and setting it to false. This allows me to do QAT, without any issue an let it convert. I think we can close issue.

And another issue was that I had SiLU activation function for come convolution layers. That was culprit. If I replace SiLU with ReLU then everything works fine, both QAT and PTQ with PT2EQuantizer and as well with XNNPACKQuantizer. Where can I find the list of allowed OPS. Thank you.

pkgoogle commented 5 hours ago

Hi @gurkirt, I see that you reopened it, but I don't see an explanation past your last post. Was this intended?