model quantization - Githubissues

LiteMedSAM encoder op: {'Slice', 'Softmax', 'Pad', 'Erf', 'Cast', 'MatMul', 'Constant', 'Sub', 'Mul', 'Pow', 'Concat', 'Reshape', 'Div', 'Transpose', 'Split', 'Conv', 'LayerNormalization', 'ConstantOfShape', 'Add', 'Shape', 'Sqrt', 'ReduceMean'}

LiteMedSAM decoder ops: {'Cos', 'Slice', 'Softmax', 'Gather', 'Gemm', 'Cast', 'Erf', 'MatMul', 'Not', 'Expand', 'Relu', 'Where', 'Constant', 'Sub', 'Resize', 'Mul', 'Reciprocal', 'Pow', 'Concat', 'Reshape', 'Unsqueeze', 'OneHot', 'ArgMax', 'Floor', 'Div', 'Transpose', 'Range', 'Flatten', 'Tile', 'ConvTranspose', 'Conv', 'LayerNormalization', 'ReduceMax', 'ConstantOfShape', 'Add', 'Shape', 'Equal', 'Sqrt', 'ReduceMean', 'Sin'}

Based on literature, MatMul, Conv, LayerNormalization, GEMM are the most computationally intensive operations. It might be worth profiling them during inference process.

To keep the accuracy, I set reduce_range=True to avoid large accuracy drop.

@nanthan987

NeuroDesk / cvpr-sam-on-laptop-2024

model quantization #7