ModelTC / MQBench

Model Quantization Benchmark
Apache License 2.0
767 stars 140 forks source link

advanced PTQ accuracy too low #250

Closed taobiaoli closed 1 year ago

taobiaoli commented 1 year ago

When I use adround and Qdrop quantization methods to do 4bit quantization, I find that the accuracy is not as good as the mse. The data and the model are the same, only in the following code Qdrop is to go advanced_ptq branch, mse is to go naive_ptq. I want to know if there is a problem with my parameter configuration? Thansks.

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='ImageNet Solver')
    parser.add_argument('--config', required=True, type=str)
    args = parser.parse_args()
    config = parse_config(args.config)
    # seed first
    seed_all(config.process.seed)
    # load_model
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    checkpoint = torch.load(config.model.path, map_location="cpu")
    model = dla.dla34()
    model = model.to(device)
    model.load_state_dict(checkpoint['state_dict'])
    #model = load_model(config.model)
    if hasattr(config, 'quantize'):
        model = get_quantize_model(model, config)
    model.to(device)
    # load_data
    train_loader, val_loader = load_eye_data(config.data.path, device)
    # evaluate
    if not hasattr(config, 'quantize'):
        evaluate(val_loader, model)
    elif config.quantize.quantize_type == 'advanced_ptq':
        print('begin calibration now!')
        cali_data = load_calibrate_data(train_loader, cali_batchsize=config.quantize.cali_batchsize)
        from mqbench.utils.state import enable_quantization, enable_calibration_woquantization
        # do activation and weight calibration seperately for quick MSE per-channel for weight one
        model.eval()
        import torch
        with torch.no_grad():
            enable_calibration_woquantization(model, quantizer_type='act_fake_quant')
            for batch in cali_data:
                model(batch.to(device))
            enable_calibration_woquantization(model, quantizer_type='weight_fake_quant')
            model(cali_data[0].to(device))
        print('begin advanced PTQ now!')
        if hasattr(config.quantize, 'reconstruction'):
            model = ptq_reconstruction(
                model, cali_data, config.quantize.reconstruction)
        enable_quantization(model)
        evaluate_eye(config.data.path, model)
        if hasattr(config.quantize, 'deploy'):
            deploy(model, config)
    elif config.quantize.quantize_type == 'naive_ptq':
        print('begin calibration now!')
        cali_data = load_calibrate_data(train_loader, cali_batchsize=config.quantize.cali_batchsize)
        from mqbench.utils.state import enable_quantization, enable_calibration_woquantization
        # do activation and weight calibration seperately for quick MSE per-channel for weight one
        model.eval()
        enable_calibration_woquantization(model, quantizer_type='act_fake_quant')
        for batch in cali_data:
            model(batch.to(device)) # inference caculate activation scale and zp from float32 [min max]
        enable_calibration_woquantization(model, quantizer_type='weight_fake_quant')
        model(cali_data[0].to(device))  # caculate weights scale and zp from float32 [min max]
        print('begin quantization now!')
        enable_quantization(model)
        evaluate_eye(config.data.path, model)
        if hasattr(config.quantize, 'deploy'):
            deploy(model, config)
    else:
        print("The quantize_type must in 'naive_ptq' or 'advanced_ptq',")
        print("and 'advanced_ptq' need reconstruction configration.")

In the meantime, their yaml files are shown below: Qdrop

extra_prepare_dict:
    extra_qconfig_dict:
        w_observer: MSEObserver
        a_observer: EMAMSEObserver
        w_fakequantize: AdaRoundFakeQuantize
        a_fakequantize: QDropFakeQuantize
        w_qscheme:
            bit: 4
            symmetry: True
            per_channel: True
            pot_scale: True
            p: 2.4
        a_qscheme:
            bit: 4
            symmetry: True
            per_channel: True
            pot_scale: True
            p: 2.4
quantize:
    quantize_type: advanced_ptq # support naive_ptq or advanced_ptq
    cali_batchsize: 16
    reconstruction:
        pattern: block
        scale_lr: 4.0e-5
        warm_up: 0.2
        weight: 0.01
        max_count: 20000
        b_range: [20,2]
        keep_gpu: False
        round_mode: learned_hard_sigmoid
        prob: 0.5
model:                    # architecture details
    type: eye_tracking        # model name
    kwargs:
        num_classes: 1000
    path: D:\research\MQBench\application\model\model\model_best_0607.pth.tar
data:
    path: D:\research\MQBench\dataset\data
    batch_size: 64
    num_workers: 4
    pin_memory: True
    input_size: 224
    test_resize: 256
process:
    seed: 1005

mse

extra_prepare_dict:
    extra_qconfig_dict:
        w_observer: MSEObserver
        a_observer: EMAMSEObserver
        w_fakequantize: FixedFakeQuantize
        a_fakequantize: FixedFakeQuantize
        w_qscheme:
            bit: 4
            symmetry: True
            per_channel: True
            pot_scale: True
        a_qscheme:
            bit: 4
            symmetry: True
            per_channel: True
            pot_scale: True

quantize:
    quantize_type: naive_ptq # support naive_ptq or advanced_ptq
    cali_batchsize: 16
    deploy:
        output_path: D:\research\MQBench\application\model\eye_tracking
        model_name: 'eye_naive_mse_4_8'
        deploy_to_qlinear: False
model:                    # architecture details
    type: eye_tracking        # model name
    kwargs:
        num_classes: 1000
    path: D:\research\MQBench\application\model\model\model_best_0607.pth.tar
data:
    path: D:\research\MQBench\dataset\data
    batch_size: 64
    num_workers: 4
    pin_memory: True
    input_size: 224
    test_resize: 256
process:
    seed: 1005
taobiaoli commented 1 year ago

add information. I find it seems that the loss does not converge well, as shown below. 8f48694211f7e5f51154ab90106a845

talenz commented 1 year ago

Same problem. What's the 4bit accuracy of your model on aptq?

When I use adround and Qdrop quantization methods to do 4bit quantization, I find that the accuracy is not as good as the mse. The data and the model are the same, only in the following code Qdrop is to go advanced_ptq branch, mse is to go naive_ptq. I want to know if there is a problem with my parameter configuration? Thansks.

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='ImageNet Solver')
    parser.add_argument('--config', required=True, type=str)
    args = parser.parse_args()
    config = parse_config(args.config)
    # seed first
    seed_all(config.process.seed)
    # load_model
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    checkpoint = torch.load(config.model.path, map_location="cpu")
    model = dla.dla34()
    model = model.to(device)
    model.load_state_dict(checkpoint['state_dict'])
    #model = load_model(config.model)
    if hasattr(config, 'quantize'):
        model = get_quantize_model(model, config)
    model.to(device)
    # load_data
    train_loader, val_loader = load_eye_data(config.data.path, device)
    # evaluate
    if not hasattr(config, 'quantize'):
        evaluate(val_loader, model)
    elif config.quantize.quantize_type == 'advanced_ptq':
        print('begin calibration now!')
        cali_data = load_calibrate_data(train_loader, cali_batchsize=config.quantize.cali_batchsize)
        from mqbench.utils.state import enable_quantization, enable_calibration_woquantization
        # do activation and weight calibration seperately for quick MSE per-channel for weight one
        model.eval()
        import torch
        with torch.no_grad():
            enable_calibration_woquantization(model, quantizer_type='act_fake_quant')
            for batch in cali_data:
                model(batch.to(device))
            enable_calibration_woquantization(model, quantizer_type='weight_fake_quant')
            model(cali_data[0].to(device))
        print('begin advanced PTQ now!')
        if hasattr(config.quantize, 'reconstruction'):
            model = ptq_reconstruction(
                model, cali_data, config.quantize.reconstruction)
        enable_quantization(model)
        evaluate_eye(config.data.path, model)
        if hasattr(config.quantize, 'deploy'):
            deploy(model, config)
    elif config.quantize.quantize_type == 'naive_ptq':
        print('begin calibration now!')
        cali_data = load_calibrate_data(train_loader, cali_batchsize=config.quantize.cali_batchsize)
        from mqbench.utils.state import enable_quantization, enable_calibration_woquantization
        # do activation and weight calibration seperately for quick MSE per-channel for weight one
        model.eval()
        enable_calibration_woquantization(model, quantizer_type='act_fake_quant')
        for batch in cali_data:
            model(batch.to(device)) # inference caculate activation scale and zp from float32 [min max]
        enable_calibration_woquantization(model, quantizer_type='weight_fake_quant')
        model(cali_data[0].to(device))  # caculate weights scale and zp from float32 [min max]
        print('begin quantization now!')
        enable_quantization(model)
        evaluate_eye(config.data.path, model)
        if hasattr(config.quantize, 'deploy'):
            deploy(model, config)
    else:
        print("The quantize_type must in 'naive_ptq' or 'advanced_ptq',")
        print("and 'advanced_ptq' need reconstruction configration.")

In the meantime, their yaml files are shown below: Qdrop

extra_prepare_dict:
    extra_qconfig_dict:
        w_observer: MSEObserver
        a_observer: EMAMSEObserver
        w_fakequantize: AdaRoundFakeQuantize
        a_fakequantize: QDropFakeQuantize
        w_qscheme:
            bit: 4
            symmetry: True
            per_channel: True
            pot_scale: True
            p: 2.4
        a_qscheme:
            bit: 4
            symmetry: True
            per_channel: True
            pot_scale: True
            p: 2.4
quantize:
    quantize_type: advanced_ptq # support naive_ptq or advanced_ptq
    cali_batchsize: 16
    reconstruction:
        pattern: block
        scale_lr: 4.0e-5
        warm_up: 0.2
        weight: 0.01
        max_count: 20000
        b_range: [20,2]
        keep_gpu: False
        round_mode: learned_hard_sigmoid
        prob: 0.5
model:                    # architecture details
    type: eye_tracking        # model name
    kwargs:
        num_classes: 1000
    path: D:\research\MQBench\application\model\model\model_best_0607.pth.tar
data:
    path: D:\research\MQBench\dataset\data
    batch_size: 64
    num_workers: 4
    pin_memory: True
    input_size: 224
    test_resize: 256
process:
    seed: 1005

mse

extra_prepare_dict:
    extra_qconfig_dict:
        w_observer: MSEObserver
        a_observer: EMAMSEObserver
        w_fakequantize: FixedFakeQuantize
        a_fakequantize: FixedFakeQuantize
        w_qscheme:
            bit: 4
            symmetry: True
            per_channel: True
            pot_scale: True
        a_qscheme:
            bit: 4
            symmetry: True
            per_channel: True
            pot_scale: True

quantize:
    quantize_type: naive_ptq # support naive_ptq or advanced_ptq
    cali_batchsize: 16
    deploy:
        output_path: D:\research\MQBench\application\model\eye_tracking
        model_name: 'eye_naive_mse_4_8'
        deploy_to_qlinear: False
model:                    # architecture details
    type: eye_tracking        # model name
    kwargs:
        num_classes: 1000
    path: D:\research\MQBench\application\model\model\model_best_0607.pth.tar
data:
    path: D:\research\MQBench\dataset\data
    batch_size: 64
    num_workers: 4
    pin_memory: True
    input_size: 224
    test_resize: 256
process:
    seed: 1005

Same problem. What's the 4bit accuracy of your model on aptq?

taobiaoli commented 1 year ago

Same problem. What's the 4bit accuracy of your model on aptq?

When I use adround and Qdrop quantization methods to do 4bit quantization, I find that the accuracy is not as good as the mse. The data and the model are the same, only in the following code Qdrop is to go advanced_ptq branch, mse is to go naive_ptq. I want to know if there is a problem with my parameter configuration? Thansks.

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='ImageNet Solver')
    parser.add_argument('--config', required=True, type=str)
    args = parser.parse_args()
    config = parse_config(args.config)
    # seed first
    seed_all(config.process.seed)
    # load_model
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    checkpoint = torch.load(config.model.path, map_location="cpu")
    model = dla.dla34()
    model = model.to(device)
    model.load_state_dict(checkpoint['state_dict'])
    #model = load_model(config.model)
    if hasattr(config, 'quantize'):
        model = get_quantize_model(model, config)
    model.to(device)
    # load_data
    train_loader, val_loader = load_eye_data(config.data.path, device)
    # evaluate
    if not hasattr(config, 'quantize'):
        evaluate(val_loader, model)
    elif config.quantize.quantize_type == 'advanced_ptq':
        print('begin calibration now!')
        cali_data = load_calibrate_data(train_loader, cali_batchsize=config.quantize.cali_batchsize)
        from mqbench.utils.state import enable_quantization, enable_calibration_woquantization
        # do activation and weight calibration seperately for quick MSE per-channel for weight one
        model.eval()
        import torch
        with torch.no_grad():
            enable_calibration_woquantization(model, quantizer_type='act_fake_quant')
            for batch in cali_data:
                model(batch.to(device))
            enable_calibration_woquantization(model, quantizer_type='weight_fake_quant')
            model(cali_data[0].to(device))
        print('begin advanced PTQ now!')
        if hasattr(config.quantize, 'reconstruction'):
            model = ptq_reconstruction(
                model, cali_data, config.quantize.reconstruction)
        enable_quantization(model)
        evaluate_eye(config.data.path, model)
        if hasattr(config.quantize, 'deploy'):
            deploy(model, config)
    elif config.quantize.quantize_type == 'naive_ptq':
        print('begin calibration now!')
        cali_data = load_calibrate_data(train_loader, cali_batchsize=config.quantize.cali_batchsize)
        from mqbench.utils.state import enable_quantization, enable_calibration_woquantization
        # do activation and weight calibration seperately for quick MSE per-channel for weight one
        model.eval()
        enable_calibration_woquantization(model, quantizer_type='act_fake_quant')
        for batch in cali_data:
            model(batch.to(device)) # inference caculate activation scale and zp from float32 [min max]
        enable_calibration_woquantization(model, quantizer_type='weight_fake_quant')
        model(cali_data[0].to(device))  # caculate weights scale and zp from float32 [min max]
        print('begin quantization now!')
        enable_quantization(model)
        evaluate_eye(config.data.path, model)
        if hasattr(config.quantize, 'deploy'):
            deploy(model, config)
    else:
        print("The quantize_type must in 'naive_ptq' or 'advanced_ptq',")
        print("and 'advanced_ptq' need reconstruction configration.")

In the meantime, their yaml files are shown below: Qdrop

extra_prepare_dict:
    extra_qconfig_dict:
        w_observer: MSEObserver
        a_observer: EMAMSEObserver
        w_fakequantize: AdaRoundFakeQuantize
        a_fakequantize: QDropFakeQuantize
        w_qscheme:
            bit: 4
            symmetry: True
            per_channel: True
            pot_scale: True
            p: 2.4
        a_qscheme:
            bit: 4
            symmetry: True
            per_channel: True
            pot_scale: True
            p: 2.4
quantize:
    quantize_type: advanced_ptq # support naive_ptq or advanced_ptq
    cali_batchsize: 16
    reconstruction:
        pattern: block
        scale_lr: 4.0e-5
        warm_up: 0.2
        weight: 0.01
        max_count: 20000
        b_range: [20,2]
        keep_gpu: False
        round_mode: learned_hard_sigmoid
        prob: 0.5
model:                    # architecture details
    type: eye_tracking        # model name
    kwargs:
        num_classes: 1000
    path: D:\research\MQBench\application\model\model\model_best_0607.pth.tar
data:
    path: D:\research\MQBench\dataset\data
    batch_size: 64
    num_workers: 4
    pin_memory: True
    input_size: 224
    test_resize: 256
process:
    seed: 1005

mse

extra_prepare_dict:
    extra_qconfig_dict:
        w_observer: MSEObserver
        a_observer: EMAMSEObserver
        w_fakequantize: FixedFakeQuantize
        a_fakequantize: FixedFakeQuantize
        w_qscheme:
            bit: 4
            symmetry: True
            per_channel: True
            pot_scale: True
        a_qscheme:
            bit: 4
            symmetry: True
            per_channel: True
            pot_scale: True

quantize:
    quantize_type: naive_ptq # support naive_ptq or advanced_ptq
    cali_batchsize: 16
    deploy:
        output_path: D:\research\MQBench\application\model\eye_tracking
        model_name: 'eye_naive_mse_4_8'
        deploy_to_qlinear: False
model:                    # architecture details
    type: eye_tracking        # model name
    kwargs:
        num_classes: 1000
    path: D:\research\MQBench\application\model\model\model_best_0607.pth.tar
data:
    path: D:\research\MQBench\dataset\data
    batch_size: 64
    num_workers: 4
    pin_memory: True
    input_size: 224
    test_resize: 256
process:
    seed: 1005

Same problem. What's the 4bit accuracy of your model on aptq?

About 64% for adound and qdrop. but mse can reach to 78%.

talenz commented 1 year ago

which model?

On Fri, Jun 9, 2023 at 11:28 taobiaoli @.***> wrote:

Same problem. What's the 4bit accuracy of your model on aptq?

When I use adround and Qdrop quantization methods to do 4bit quantization, I find that the accuracy is not as good as the mse. The data and the model are the same, only in the following code Qdrop is to go advanced_ptq branch, mse is to go naive_ptq. I want to know if there is a problem with my parameter configuration? Thansks.

if name == 'main': parser = argparse.ArgumentParser(description='ImageNet Solver') parser.add_argument('--config', required=True, type=str) args = parser.parse_args() config = parse_config(args.config)

seed first

seed_all(config.process.seed)
# load_model
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
checkpoint = torch.load(config.model.path, map_location="cpu")
model = dla.dla34()
model = model.to(device)
model.load_state_dict(checkpoint['state_dict'])
#model = load_model(config.model)
if hasattr(config, 'quantize'):
    model = get_quantize_model(model, config)
model.to(device)
# load_data
train_loader, val_loader = load_eye_data(config.data.path, device)
# evaluate
if not hasattr(config, 'quantize'):
    evaluate(val_loader, model)
elif config.quantize.quantize_type == 'advanced_ptq':
    print('begin calibration now!')
    cali_data = load_calibrate_data(train_loader, cali_batchsize=config.quantize.cali_batchsize)
    from mqbench.utils.state import enable_quantization, enable_calibration_woquantization
    # do activation and weight calibration seperately for quick MSE per-channel for weight one
    model.eval()
    import torch
    with torch.no_grad():
        enable_calibration_woquantization(model, quantizer_type='act_fake_quant')
        for batch in cali_data:
            model(batch.to(device))
        enable_calibration_woquantization(model, quantizer_type='weight_fake_quant')
        model(cali_data[0].to(device))
    print('begin advanced PTQ now!')
    if hasattr(config.quantize, 'reconstruction'):
        model = ptq_reconstruction(
            model, cali_data, config.quantize.reconstruction)
    enable_quantization(model)
    evaluate_eye(config.data.path, model)
    if hasattr(config.quantize, 'deploy'):
        deploy(model, config)
elif config.quantize.quantize_type == 'naive_ptq':
    print('begin calibration now!')
    cali_data = load_calibrate_data(train_loader, cali_batchsize=config.quantize.cali_batchsize)
    from mqbench.utils.state import enable_quantization, enable_calibration_woquantization
    # do activation and weight calibration seperately for quick MSE per-channel for weight one
    model.eval()
    enable_calibration_woquantization(model, quantizer_type='act_fake_quant')
    for batch in cali_data:
        model(batch.to(device)) # inference caculate activation scale and zp from float32 [min max]
    enable_calibration_woquantization(model, quantizer_type='weight_fake_quant')
    model(cali_data[0].to(device))  # caculate weights scale and zp from float32 [min max]
    print('begin quantization now!')
    enable_quantization(model)
    evaluate_eye(config.data.path, model)
    if hasattr(config.quantize, 'deploy'):
        deploy(model, config)
else:
    print("The quantize_type must in 'naive_ptq' or 'advanced_ptq',")
    print("and 'advanced_ptq' need reconstruction configration.")

In the meantime, their yaml files are shown below: Qdrop

extra_prepare_dict: extra_qconfig_dict: w_observer: MSEObserver a_observer: EMAMSEObserver w_fakequantize: AdaRoundFakeQuantize a_fakequantize: QDropFakeQuantize w_qscheme: bit: 4 symmetry: True per_channel: True pot_scale: True p: 2.4 a_qscheme: bit: 4 symmetry: True per_channel: True pot_scale: True p: 2.4 quantize: quantize_type: advanced_ptq # support naive_ptq or advanced_ptq cali_batchsize: 16 reconstruction: pattern: block scale_lr: 4.0e-5 warm_up: 0.2 weight: 0.01 max_count: 20000 b_range: [20,2] keep_gpu: False round_mode: learned_hard_sigmoid prob: 0.5 model: # architecture details type: eye_tracking # model name kwargs: num_classes: 1000 path: D:\research\MQBench\application\model\model\model_best_0607.pth.tar data: path: D:\research\MQBench\dataset\data batch_size: 64 num_workers: 4 pin_memory: True input_size: 224 test_resize: 256 process: seed: 1005

mse

extra_prepare_dict: extra_qconfig_dict: w_observer: MSEObserver a_observer: EMAMSEObserver w_fakequantize: FixedFakeQuantize a_fakequantize: FixedFakeQuantize w_qscheme: bit: 4 symmetry: True per_channel: True pot_scale: True a_qscheme: bit: 4 symmetry: True per_channel: True pot_scale: True

quantize: quantize_type: naive_ptq # support naive_ptq or advanced_ptq cali_batchsize: 16 deploy: output_path: D:\research\MQBench\application\model\eye_tracking model_name: 'eye_naive_mse_4_8' deploy_to_qlinear: False model: # architecture details type: eye_tracking # model name kwargs: num_classes: 1000 path: D:\research\MQBench\application\model\model\model_best_0607.pth.tar data: path: D:\research\MQBench\dataset\data batch_size: 64 num_workers: 4 pin_memory: True input_size: 224 test_resize: 256 process: seed: 1005

Same problem. What's the 4bit accuracy of your model on aptq?

About 64% for adound and qdrop. but mse can reach to 78%.

— Reply to this email directly, view it on GitHub https://github.com/ModelTC/MQBench/issues/250#issuecomment-1583911861, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABOWNRUVV6N25WQMDZ7KZG3XKKJ37ANCNFSM6AAAAAAY65ZT4U . You are receiving this because you commented.Message ID: @.***>

taobiaoli commented 1 year ago

which model? On Fri, Jun 9, 2023 at 11:28 taobiaoli @.> wrote: Same problem. What's the 4bit accuracy of your model on aptq? When I use adround and Qdrop quantization methods to do 4bit quantization, I find that the accuracy is not as good as the mse. The data and the model are the same, only in the following code Qdrop is to go advanced_ptq branch, mse is to go naive_ptq. I want to know if there is a problem with my parameter configuration? Thansks. if name == 'main': parser = argparse.ArgumentParser(description='ImageNet Solver') parser.add_argument('--config', required=True, type=str) args = parser.parse_args() config = parse_config(args.config) # seed first seed_all(config.process.seed) # load_model device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") checkpoint = torch.load(config.model.path, map_location="cpu") model = dla.dla34() model = model.to(device) model.load_state_dict(checkpoint['state_dict']) #model = load_model(config.model) if hasattr(config, 'quantize'): model = get_quantize_model(model, config) model.to(device) # load_data train_loader, val_loader = load_eye_data(config.data.path, device) # evaluate if not hasattr(config, 'quantize'): evaluate(val_loader, model) elif config.quantize.quantize_type == 'advanced_ptq': print('begin calibration now!') cali_data = load_calibrate_data(train_loader, cali_batchsize=config.quantize.cali_batchsize) from mqbench.utils.state import enable_quantization, enable_calibration_woquantization # do activation and weight calibration seperately for quick MSE per-channel for weight one model.eval() import torch with torch.no_grad(): enable_calibration_woquantization(model, quantizer_type='act_fake_quant') for batch in cali_data: model(batch.to(device)) enable_calibration_woquantization(model, quantizer_type='weight_fake_quant') model(cali_data[0].to(device)) print('begin advanced PTQ now!') if hasattr(config.quantize, 'reconstruction'): model = ptq_reconstruction( model, cali_data, config.quantize.reconstruction) enable_quantization(model) evaluate_eye(config.data.path, model) if hasattr(config.quantize, 'deploy'): deploy(model, config) elif config.quantize.quantize_type == 'naive_ptq': print('begin calibration now!') cali_data = load_calibrate_data(train_loader, cali_batchsize=config.quantize.cali_batchsize) from mqbench.utils.state import enable_quantization, enable_calibration_woquantization # do activation and weight calibration seperately for quick MSE per-channel for weight one model.eval() enable_calibration_woquantization(model, quantizer_type='act_fake_quant') for batch in cali_data: model(batch.to(device)) # inference caculate activation scale and zp from float32 [min max] enable_calibration_woquantization(model, quantizer_type='weight_fake_quant') model(cali_data[0].to(device)) # caculate weights scale and zp from float32 [min max] print('begin quantization now!') enable_quantization(model) evaluate_eye(config.data.path, model) if hasattr(config.quantize, 'deploy'): deploy(model, config) else: print("The quantize_type must in 'naive_ptq' or 'advanced_ptq',") print("and 'advanced_ptq' need reconstruction configration.") In the meantime, their yaml files are shown below: Qdrop extra_prepare_dict: extra_qconfig_dict: w_observer: MSEObserver a_observer: EMAMSEObserver w_fakequantize: AdaRoundFakeQuantize a_fakequantize: QDropFakeQuantize w_qscheme: bit: 4 symmetry: True per_channel: True pot_scale: True p: 2.4 a_qscheme: bit: 4 symmetry: True per_channel: True pot_scale: True p: 2.4 quantize: quantize_type: advanced_ptq # support naive_ptq or advanced_ptq cali_batchsize: 16 reconstruction: pattern: block scale_lr: 4.0e-5 warm_up: 0.2 weight: 0.01 max_count: 20000 b_range: [20,2] keep_gpu: False round_mode: learned_hard_sigmoid prob: 0.5 model: # architecture details type: eye_tracking # model name kwargs: num_classes: 1000 path: D:\research\MQBench\application\model\model\model_best_0607.pth.tar data: path: D:\research\MQBench\dataset\data batch_size: 64 num_workers: 4 pin_memory: True input_size: 224 test_resize: 256 process: seed: 1005 mse extra_prepare_dict: extra_qconfig_dict: w_observer: MSEObserver a_observer: EMAMSEObserver w_fakequantize: FixedFakeQuantize a_fakequantize: FixedFakeQuantize w_qscheme: bit: 4 symmetry: True per_channel: True pot_scale: True a_qscheme: bit: 4 symmetry: True per_channel: True pot_scale: True quantize: quantize_type: naive_ptq # support naive_ptq or advanced_ptq cali_batchsize: 16 deploy: output_path: D:\research\MQBench\application\model\eye_tracking model_name: 'eye_naive_mse_4_8' deploy_to_qlinear: False model: # architecture details type: eye_tracking # model name kwargs: num_classes: 1000 path: D:\research\MQBench\application\model\model\model_best_0607.pth.tar data: path: D:\research\MQBench\dataset\data batch_size: 64 num_workers: 4 pin_memory: True input_size: 224 test_resize: 256 process: seed: 1005 Same problem. What's the 4bit accuracy of your model on aptq? About 64% for adound and qdrop. but mse can reach to 78%. — Reply to this email directly, view it on GitHub <#250 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABOWNRUVV6N25WQMDZ7KZG3XKKJ37ANCNFSM6AAAAAAY65ZT4U . You are receiving this because you commented.Message ID: @.>

our custom model.

wimh966 commented 1 year ago

First, there is something wrong with your configs both for advanced ptq and naive ones. Configs for activation (a_qscheme): we can not use per-channel quantization for activation which prohibits low-bit inference acceleration, please turn it into False. Also, pot_scale is not a usual setting, which turns on non-uniform quantization. Please consider the configs files in application/imagenet_example/PTQ/configs/qdrop. Thank you.

taobiaoli commented 1 year ago

First, there is something wrong with your configs both for advanced ptq and naive ones. Configs for activation (a_qscheme): we can not use per-channel quantization for activation which prohibits low-bit inference acceleration, please turn it into False. Also, pot_scale is not a usual setting, which turns on non-uniform quantization. Please consider the configs files in application/imagenet_example/PTQ/configs/qdrop. Thank you. As you suggested, I set per-channel to false in a_qscheme, and the result got even worse, only 44%。

github-actions[bot] commented 1 year ago

This issue has not received any updates in 120 days. Please reply to this issue if this still unresolved!