Dipoorlet is an offline quantization tool that can perform offline quantization on ONNX model on a given calibration dataset:
git clone https://github.com/ModelTC/Dipoorlet.git
cd Dipoorlet
python setup.py install
Project using ONNXRuntime as inference runtime, using Pytorch as training tool, so users have to carefully make CUDA and CUDNN version right in order to make this two runtime work.
For example:
ONNXRuntime==1.10.0
and Pytorch==1.10.0-1.13.0
can runs under CUDA==11.4 CUDNN==8.2.4
Please visit ONNXRuntime and Pytorch.
ONNXRuntime has bug when running in docker when cpu-sets
is set.
Please check issue
The pre processed calibration data needs to be prepared and provided in a specific path form. For example, the model has two input tensors called "input_0" and "input_1", and the file structure is as follows:
cali_data_dir
|
├──input_0
│ ├──0.bin
│ ├──1.bin
│ ├──...
│ └──N-1.bin
└──input_1
├──0.bin
├──1.bin
├──...
└──N-1.bin
python -m torch.distributed.launch --use_env -m dipoorlet -M MODEL_PATH -I INPUT_PATH -N PIC_NUM -A [mse, hist, minmax] -D [trt, snpe, rv, atlas, ti, stpu] [--bc] [--adaround] [--brecq] [--drop]
python -m dipoorlet -M MODEL_PATH -I INPUT_PATH -N PIC_NUM -A [mse, hist, minmax] -D [trt, snpe, rv, atlas, ti, stpu] [--bc] [--adaround] [--brecq] [--drop] [--slurm | --mpirun]
Quantify an onnx model model.onnx, save 100 calibration data in workdir/data/, where "data" represents the name of the onnx model. Use “minmax“ activation value calibration algorithm, use “Qdrop“ to perform unlabeled fine tuning on weights, and finally generate TensorRT quantization configuration information:
workdir
|
├──data
├──0.bin
├──1.bin
├──...
└──99.bin
python -m torch.distributed.launch --use_env -m dipoorlet -M model.onnx -I workdir/ -N 100 -A minmax -D trt