WongKinYiu / yolov7

Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
GNU General Public License v3.0
13.27k stars 4.19k forks source link

Post-training Static Quantization #263

Open dmartinez-quercus opened 2 years ago

dmartinez-quercus commented 2 years ago

Hello there,

Would it be easy (or possible) to implement a post-training static quantization process by following the official Pytorch indications?

I'm trying to do it, however, I'm quite new to Pytorch. I'm not so sure if it would be as easy as adding a QuantStub() layer as the first layer and a DeQuantStub() layer after every output conv2d in the "Detect" module. My perception is that this YOLOv7 implementation has more complexity and would need more adjustments to make this happen.

Thanks.

pytholic commented 2 years ago

@dmartinez-quercus I added the Stubs by making a new model class. However, I get the following error when I try to load the model:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/home/pytholic/Desktop/Projects/yolo-pose/yolov7/temp.ipynb Cell 3 in <cell line: 1>()
----> 1 model = attempt_load(weights, map_location=device)  # load FP32 model
      2 print(model)

File ~/Desktop/Projects/yolo-pose/yolov7/models/experimental.py:243, in attempt_load(weights, map_location)
    241     attempt_download(w)
    242     ckpt = torch.load(w, map_location=map_location)  # load
--> 243     model.append(ckpt['ema' if ckpt.get('ema') else 'model'].float().fuse().eval())  # FP32 model
    245 # Compatibility updates
    246 for m in model.modules():

KeyError: 'model

The reason is that the quantized model has different key structure than the original model.

Original model

Model(
  (model): Sequential(
    (0): ReOrg()
    (1): Conv(
      (conv): Conv2d(12, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (act): SiLU(inplace=True)
    )
    (2): Conv(
      (conv): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
      (act): SiLU(inplace=True)
    )
    (3): Conv(
      (conv): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU(inplace=True)
    )
    (4): Conv(
      (conv): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU(inplace=True)
    )
    (5): Conv(
      (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (act): SiLU(inplace=True)
    )
    (6): Conv(
...
      )
    )
  )
)

Quantized model

StubbedModel(
  (quant): Quantize(scale=tensor([0.07225]), zero_point=tensor([69]), dtype=torch.quint8)
  (model): Model(
    (model): Sequential(
      (0): ReOrg()
      (1): Conv(
        (conv): QuantizedConv2d(12, 64, kernel_size=(3, 3), stride=(1, 1), scale=1.736299753189087, zero_point=62, padding=(1, 1))
        (act): SiLU(inplace=True)
      )
      (2): Conv(
        (conv): QuantizedConv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), scale=2.210493803024292, zero_point=67, padding=(1, 1))
        (act): SiLU(inplace=True)
      )
      (3): Conv(
        (conv): QuantizedConv2d(128, 64, kernel_size=(1, 1), stride=(1, 1), scale=0.8890335559844971, zero_point=87)
        (act): SiLU(inplace=True)
      )
      (4): Conv(
        (conv): QuantizedConv2d(128, 64, kernel_size=(1, 1), stride=(1, 1), scale=1.1230701208114624, zero_point=91)
        (act): SiLU(inplace=True)
      )
      (5): Conv(
        (conv): QuantizedConv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), scale=0.5115273594856262, zero_point=77, padding=(1, 1))
        (act): SiLU(inplace=True)
      )
...
    )
  )
  (dequant): DeQuantize()
)

The attempt_load function looks for the ['model'] key and then fails. Not sure how to resolve this.

RonaldYuren commented 12 months ago

Hello! May I ask that the problem about "Post-training Static Quantization" has been solved yet? I have the same question about it

dmartinez-quercus commented 12 months ago

Hello! May I ask that the problem about "Post-training Static Quantization" has been solved yet? I have the same question about it

Hi! Back then I couldn't find any "convenient" way to do it, so I tried to find an alternative. Regarding our edge inference devices only run Tensorflow Lite backend, I ended up replicating the model on Tensorflow-Keras and I just transferred the weights once the model is trained. The quantization process in TF2 was simpler to me.