alibaba / TinyNeuralNetwork

TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.
MIT License
736 stars 118 forks source link

Not able to use converter.py to generate pytorch (mobilenet) to tflite(int8-quantized) for mobilenet model using Colab #51

Closed nyadla-sys closed 2 years ago

nyadla-sys commented 2 years ago

Please see the below colab that I am using to convert mobilet v2 from pytorch to tflite(int8) https://colab.research.google.com/drive/1eW-I0RDzB3L6Zbz364t5lkI4fxgvpGbI#scrollTo=5YtQg5Ga2wmq

Getting the below erro]rs Traceback (most recent call last): File "./examples/converter/convert.py", line 9, in from examples.models.cifar10.mobilenet import DEFAULT_STATE_DICT, Mobilenet ModuleNotFoundError: No module named 'examples.models'

peterjc123 commented 2 years ago

@nyadla-sys Thanks for trying out our project. We will have a look soon.

nyadla-sys commented 2 years ago

@peterjc123 I like to convert below pytorch mobilenetv2 model to tflite(int8) using TFLiteconverter that is implemented as part of TinyNeuralNetwork GitHub import torchvision.models as models model = models.mobilenet_v2(pretrained=True) model.eval()

and also I like to use below x as Input to the model(# model input (or a tuple for multiple inputs) x = torch.randn(1, 3, 224, 224, requires_grad=True)

peterjc123 commented 2 years ago

@nyadla-sys Yes, you are free to use your model. This is just the example code.

nyadla-sys commented 2 years ago

@peterjc123 Actually i also written script that converts from pytorch to tflite(int8) using below colab it seems to be tflite(float32) models works as expected ,but tflite(int8 quantized) results are not correct https://github.com/nyadla-sys/pytorch_2_tflite/blob/main/pytorch_to_onnx_to_tflite(quantized)_with_imagedata.ipynb

nyadla-sys commented 2 years ago

@peterjc123 so I am thinking to use your GitHub to convert from pytorch model to tflite(quantized) model if you can have colab notebook which does something lke this with your GitHub,that is appreciated

peterjc123 commented 2 years ago

@nyadla-sys Could you please tell me how you converted the model to int8 quantized format through TinyNeuralNetwork?

nyadla-sys commented 2 years ago

@peterjc123 I just changed permission to colab and please use the below link https://colab.research.google.com/drive/1eW-I0RDzB3L6Zbz364t5lkI4fxgvpGbI?usp=sharing

peterjc123 commented 2 years ago

@nyadla-sys Thanks, I will take a look.

nyadla-sys commented 2 years ago

@nyadla-sys Could you please tell me how you converted the model to int8 quantized format through TinyNeuralNetwork?

I have not really started for mobilenet v2 model and I am only trying to do the example that was given as part of TinyNeuralNetwork and I hope it is converting on pytorch mobilenet v1 model

nyadla-sys commented 2 years ago

@peterjc123 it may be good idea to add colab notebooks, which can take few pytorch models and convert them to tflite(quantized).

peterjc123 commented 2 years ago

@nyadla-sys Looks like an environment issue. In the colab environment, the namespace examples refer to a package instead of the module in our project. It can be resolved by using sys.path.insert(0, xxx) instead of sys.path.append. Also some __init__.pys are missing.

nyadla-sys commented 2 years ago

@peterjc123 if possible could you please create colab and share it with me

peterjc123 commented 2 years ago

@nyadla-sys I've updated the repo so you can run the scripts your shared with a kernel restart

peterjc123 commented 2 years ago

@nyadla-sys BTW, https://github.com/alibaba/TinyNeuralNetwork/blob/main/examples/converter/convert.py is the example for converting a PyTorch model to a float32 TFLite model. If you want quantized models, please refer to https://github.com/alibaba/TinyNeuralNetwork/blob/main/examples/qat/qat.py.

nyadla-sys commented 2 years ago

@peterjc123 When I run below command on colab !python /content/TinyNeuralNetwork/examples/qat/qat.py

Same error observed ,may be fix needs to be added to qat.py too Traceback (most recent call last): File "/content/TinyNeuralNetwork/examples/qat/qat.py", line 7, in from examples.models.cifar10.mobilenet import DEFAULT_STATE_DICT, Mobilenet ModuleNotFoundError: No module named 'examples.models'

nyadla-sys commented 2 years ago

@peterjc123 I made necessary changes and am getting different error while running qat.py https://colab.research.google.com/drive/1eW-I0RDzB3L6Zbz364t5lkI4fxgvpGbI?usp=sharing

Traceback (most recent call last): File "/content/TinyNeuralNetwork/examples/qat/qat.py", line 107, in main_worker(args) File "/content/TinyNeuralNetwork/examples/qat/qat.py", line 71, in main_worker context.train_loader, context.val_loader = get_dataloader(args.data_path, 224, args.batch_size, args.workers) File "/content/TinyNeuralNetwork/examples/qat/../../tinynn/util/cifar10.py", line 45, in get_dataloader transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)), File "/usr/local/lib/python3.7/dist-packages/torchvision/datasets/cifar.py", line 69, in init raise RuntimeError('Dataset not found or corrupted.' + RuntimeError: Dataset not found or corrupted. You can use download=True to download it

peterjc123 commented 2 years ago

@peterjc123 When I run below command on colab !python /content/TinyNeuralNetwork/examples/qat/qat.py

Same error observed ,may be fix needs to be added to qat.py too Traceback (most recent call last): File "/content/TinyNeuralNetwork/examples/qat/qat.py", line 7, in from examples.models.cifar10.mobilenet import DEFAULT_STATE_DICT, Mobilenet ModuleNotFoundError: No module named 'examples.models'

@nyadla-sys Should be fixed.

nyadla-sys commented 2 years ago

it is continue to throw some other errors

peterjc123 commented 2 years ago

https://colab.research.google.com/drive/1eW-I0RDzB3L6Zbz364t5lkI4fxgvpGbI?usp=sharing

This default parameters in this script are not designed for running in the colab environment since it has limited resources. You need to lower the batch size and the number of workers to a proper value.

peterjc123 commented 2 years ago

@nyadla-sys Below is the working copy of qat.py in the colab environment with GPU. I've updated the number of workers to 2, the batch size to 128 and the number of epochs to 1. If you use CPU, please lower the batch size to 96 and you can also set context.max_iteration to speed up training.

import argparse
import os
import sys

CURRENT_PATH = os.path.abspath(os.path.dirname(__file__))

sys.path.insert(1, os.path.join(CURRENT_PATH, '../../'))

import torch
import torch.nn as nn
import torch.optim as optim

from examples.models.cifar10.mobilenet import DEFAULT_STATE_DICT, Mobilenet
from tinynn.converter import TFLiteConverter
from tinynn.graph.quantization.quantizer import QATQuantizer
from tinynn.graph.tracer import model_tracer
from tinynn.util.cifar10 import get_dataloader, train_one_epoch, validate
from tinynn.util.train_util import DLContext, get_device, train

def main_worker(args):
    with model_tracer():
        model = Mobilenet()
        model.load_state_dict(torch.load(DEFAULT_STATE_DICT))

        # Provide a viable input for the model
        dummy_input = torch.rand((1, 3, 224, 224))

        # TinyNeuralNetwork provides a QATQuantizer class that may rewrite the graph for and perform model fusion for
        # quantization. The model returned by the `quantize` function is ready for QAT.
        # By default, the rewritten model (in the format of a single file) will be generated in the working directory.
        # You may also pass some custom configuration items through the argument `config` in the following line. For
        # example, if you have a QAT-ready model (e.g models in torchvision.models.quantization),
        # then you may use the following line.
        #   quantizer = QATQuantizer(model, dummy_input, work_dir='out', config={'rewrite_graph': False})
        # Alternatively, if you have modified the generated model description file and want the quantizer to load it
        # instead, then use the code below.
        #     quantizer = QATQuantizer(
        #         model, dummy_input, work_dir='out', config={'force_overwrite': False, 'is_input_quantized': None}
        #     )
        # The `is_input_quantized` in the previous line is a flag on the input tensors whether they are quantized or
        # not, which can be None (False for all inputs) or a list of booleans that corresponds to the inputs.
        # Also, we support multiple qschemes for quantization preparation. There are several common choices.
        #   a. Asymmetric uint8. (default) config={'asymmetric': True, 'per_tensor': True}
        #      The is the most common choice and also conforms to the legacy TFLite quantization spec.
        #   b. Asymmetric int8. config={'asymmetric': True, 'per_tensor': False}
        #      The conforms to the new TFLite quantization spec. In legacy TF versions, this is usually used in post
        #      quantization. Compared with (a), it has support for per-channel quantization in supported kernels
        #      (e.g Conv), while (a) does not.
        #   c. Symmetric int8. config={'asymmetric': False, 'per_tensor': False}
        #      The is same to (b) with no offsets, which may be used on some low-end embedded chips.
        #   d. Symmetric uint8. config={'asymmetric': False, 'per_tensor': True}
        #      The is same to (a) with no offsets. But it is rarely used, which just serves as a placeholder here.

        quantizer = QATQuantizer(model, dummy_input, work_dir='out')
        qat_model = quantizer.quantize()

    print(qat_model)

    # Use DataParallel to speed up training when possible
    if torch.cuda.device_count() > 1:
        qat_model = nn.DataParallel(qat_model)

    # Move model to the appropriate device
    device = get_device()
    qat_model.to(device=device)

    context = DLContext()
    context.device = device
    context.train_loader, context.val_loader = get_dataloader(args.data_path, 224, args.batch_size, args.workers, download=True)
    context.max_epoch = 1
    context.criterion = nn.BCEWithLogitsLoss()
    context.optimizer = torch.optim.SGD(qat_model.parameters(), 0.01, momentum=0.9, weight_decay=5e-4)
    context.scheduler = optim.lr_scheduler.CosineAnnealingLR(context.optimizer, T_max=context.max_epoch + 1, eta_min=0)

    # Quantization-aware training
    train(qat_model, context, train_one_epoch, validate, qat=True)

    with torch.no_grad():
        qat_model.eval()
        qat_model.cpu()

        # The step below converts the model to an actual quantized model, which uses the quantized kernels.
        qat_model = torch.quantization.convert(qat_model)

        # When converting quantized models, please ensure the quantization backend is set.
        torch.backends.quantized.engine = quantizer.backend

        # The code section below is used to convert the model to the TFLite format
        # If you need a quantized model with a specific data type (e.g. int8)
        # you may specify `quantize_target_type='int8'` in the following line.
        # If you need a quantized model with strict symmetric quantization check (with pre-defined zero points),
        # you may specify `strict_symmetric_check=True` in the following line.
        converter = TFLiteConverter(qat_model, dummy_input, tflite_path='out/qat_model.tflite')
        converter.convert()

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--data-path', metavar='DIR', default="/content/data/datasets/cifar10", help='path to dataset')
    parser.add_argument('--config', type=str, default=os.path.join(CURRENT_PATH, 'config.yml'))
    parser.add_argument('--workers', type=int, default=2)
    parser.add_argument('--batch-size', type=int, default=128)

    args = parser.parse_args()
    main_worker(args)
peterjc123 commented 2 years ago

@peterjc123 Actually i also written script that converts from pytorch to tflite(int8) using below colab it seems to be tflite(float32) models works as expected ,but tflite(int8 quantized) results are not correct https://github.com/nyadla-sys/pytorch_2_tflite/blob/main/pytorch_to_onnx_to_tflite(quantized)_with_imagedata.ipynb

As can be seen from the script, you are passing in an int8 tensor. But instead, since we have quantize and dequantize nodes around the input and the output tensors. You should pass in the original float values instead. We support removing the quantize and dequantize nodes by passing in fuse_quant_dequant=True while constructing the TFLiteConverter object, but we don't support setting custom input ranges. So, if you remove them, you'll need to quantize the input tensor manually.

BTW, I really think we should provide a full code example like you do. We will take some time to figure out one when we have time.

peterjc123 commented 2 years ago

@nyadla-sys https://colab.research.google.com/drive/1P-lpfIcPVgfzfpCQqj3nRiNrnKR9ZoZU?usp=sharing Figured out the example to get a post-quantized model using TinyNeuralNetwork.

nyadla-sys commented 2 years ago

@peterjc123 Exccellent work. Thank you for your time, and providing a working colab helps a lot.

nyadla-sys commented 2 years ago

@peterjc123 While exploring the generated tflite model I found that it has suboptimal quantization parameters(scale,Zp) for fused activations (Relu6) as it doesn’t fully exploit the quantization range of Relu6 [0,6.0]. So I was wondering if there are some optimization/quantization settings that can be used in pytorch to tflite conversion process to optimal generate scale/zp for tflite model.

peterjc123 commented 2 years ago

@nyadla-sys It's hard to say that a larger range is an optimal choice because it leads to lower bitwise precision. Usually, accuracy is the more important factor in quantization so I would say wanting a whole range of [0.0, 6.0] may be your preference. To support this particular case, we may need to make the following changes.

  1. Insert QuantStub nodes after Relu6 nodes during QAT graph rewriting.
  2. Disable the observer of those nodes and hardcode their quant_min and quant_max to 0.0 and 6.0
  3. Get rid of the additional Quantize nodes with the optimization passes while converting the model to TFLite
nyadla-sys commented 2 years ago

@peterjc123 thanks for your promt response.

nyadla-sys commented 2 years ago

@peterjc123 Idea was not to increase the range but to restrict float range [0.0, 6.0] at the output of Conv/DW/FC where Relu6 is fused. As beyond [0.0,6.0] range anyway value will be clip.

peterjc123 commented 2 years ago

@nyadla-sys Could you please elaborate a little bit? You may use Netron to visualize the TFLite models to get a clearer explanation.

peterjc123 commented 2 years ago

@nyadla-sys Okay, I think I understand your problem.

>>> import torch
>>> a = torch.quantize_per_tensor(torch.zeros(1,3,224,224), torch.tensor(0.5), torch.tensor(128), torch.quint8)
>>> torch.nn.ReLU6()(a)
tensor([[[[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          ...,
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.]],

         [[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          ...,
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.]],

         [[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          ...,
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.]]]], size=(1, 3, 224, 224),
       dtype=torch.quint8, quantization_scheme=torch.per_tensor_affine,
       scale=0.5, zero_point=128)

It doesn't generate a new set of quantization params after going through the ReLU6 nodes. However, the problem will only exist when the activations are not fused. Consider the pattern, conv-bn-relu, it will be replaced with a new module ConvBnReLU2d so the observer is fused too. So it seems that we need to add a QuantStub node after every isolated activation node and clamping function that has certain value ranges (e.g. ReLU, ReLU6, torch.{clamp,hardtanh,minimum,maximum}).

peterjc123 commented 2 years ago

With https://github.com/alibaba/TinyNeuralNetwork/commit/5044f77388cc394276a2f2f31327089f73f7df77, it should try to fuse the activations (e.g. relu6) with nodes that supports re-quantization. @nyadla-sys

nyadla-sys commented 2 years ago

thanks @peterjc123