alibaba / TinyNeuralNetwork

TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.
MIT License
735 stars 118 forks source link

QAT problem on YOLOv9 #345

Closed hoangtv2000 closed 1 month ago

hoangtv2000 commented 1 month ago

Hi @zk1998, Im sorry to bother you again. I implemented your QAT solution for my YOLOv9 (version gelan-s) model. But I got the perfromance degradation problem, the QAT-ready model has dropped performance to 0.0015mAP.

                 Class     Images  Instances          P          R      mAP50   mAP50-95: 100%|██████████| 85/85 00:19
                   all       2692      10774     0.0799     0.0117    0.00336    0.00115

I tried with your solution script which solved YOLOv8n model problem but the problem in the YOLOv9 model is still occured. Here are the script I tried to QAT YOLOv9 (version gelan-s) model.

    with model_tracer():
        # dummy_input = torch.rand(1, 3, imgsz, imgsz)
        # # Firstly, you should trace model in eval mode,
        # # then manually modify the traced model to accommodate the differences between evaluation and training modes.
        # quantizer = PostQuantizer(
        #     model, dummy_input, work_dir=qat_rewrite_dir
        # )
        # quantizer.quantize()
        # exit()

        print("[INFO] Loading rewritten model!")
        from qat_out_no_prune.detectionmodel_q import QDetectionModel
        rewrite_model = QDetectionModel()
        dummy_input = torch.rand(1, 3, imgsz, imgsz)
        rewrite_model.load_state_dict(torch.load(qat_rewrite_dir + '/detectionmodel_q.pth'))

        # When the weight distributions fluctuates greatly, CLE may significantly increase the quantization accuracy.
        # rewrite_model = cross_layer_equalize(rewrite_model, dummy_input, device, cle_iters=4, hba_flag=False)

        # Perform BatchNorm restore after CLE to make QAT more stable and faster.
        # context.max_iteration = 100
        # context.train_loader = calib_loader
        # rewrite_model = model_restore_bn(rewrite_model, device, calibrate_func, context)

        quantizer = QATQuantizer(rewrite_model, dummy_input, work_dir=qat_rewrite_dir, config={'force_rewrite': False, "rewrite_graph": False,})
        qat_model = quantizer.quantize()
        qat_model(next(iter(calib_loader))[0].float() / 255.0) # I have to do this. If not, I will get "min nan should not be greater than max nan" error.

    def ptq_fuse(self, verbose=None):
        return self
    import types

    # Add attr to fit AutoBackend when validating using ultralytics.
    qat_model.fuse = types.MethodType(ptq_fuse, qat_model)
    qat_model.nc = nc  # attach number of classes to model
    qat_model.hyp = hyp  # attach hyperparameters to model
    qat_model.class_weights = labels_to_class_weights(dataset.labels, nc).to(device) * nc  # attach class weights
    qat_model.names = names
    qat_model = qat_model.to(device)

    # Use ultralytics's validator to do calibrating(or training if use qat),
    # !!!! you must keep the img-preprocessing is the same as validating (keep in mind when trianing).
    qat_model.apply(torch.quantization.disable_fake_quant)
    qat_model.apply(torch.quantization.enable_observer)
    qat_model.eval()

    bar = tqdm(enumerate(calib_loader), total=len(calib_loader), bar_format=TQDM_BAR_FORMAT) 
    # we do one inference to simulate qat init state
    for batch_i, (imgs, targets, paths, _) in (bar):
        imgs = imgs.to(device, non_blocking=True).float() / 255.0
        qat_model(imgs)
        if batch_i == 64:
            break

    # Disable observer and enable fake quantization to validate model with quantization error
    qat_model = deepcopy(qat_model)
    qat_model = qat_model.to(device)
    dummy_input = dummy_input.to(device)
    qat_model.apply(torch.quantization.disable_observer)
    qat_model.apply(torch.quantization.enable_fake_quant)
    qat_model(dummy_input)

    # Remove the postprocess(DFL block) to improve acc which hurts quantized model's mAPs a lot.
    unq_flag = False
    print('[INFO] Here are unquantized module')
    for name, module in qat_model.named_modules():
        if 'fake_dequant_inner_0_0_0' in name:
            unq_flag = True
        if ((unq_flag and name.split('_')[-1].isdigit() and int(name.split('_')[-1]) >= 157)
        ):
            module.apply(torch.quantization.disable_fake_quant)
            print(name)

    results, maps, _ = validate.run(data_dict,
                                    batch_size=batch_size // WORLD_SIZE * 2,
                                    imgsz=imgsz,
                                    half=amp,
                                    model=qat_model,
                                    single_cls=False,
                                    dataloader=val_loader,
                                    save_dir=save_dir,
                                    plots=False,
                                    callbacks=callbacks,
                                    compute_loss=None)

    print("[INFO] Evaluation metric after inserting fake-quant module:")
    print(results)

    dummy_input_real = next(iter(calib_loader))[0].float() / 255.0
    # get_weight_dis(qat_model, save_path='/data/hoangtv23/workspace_AIOT/model_compression_flow/PruneQuantExperiments/yolov9/qat_out_no_prune/weight_dis')
    graph_error_analysis(qat_model, dummy_input_real, metric='cosine')
    layer_error_analysis(qat_model, dummy_input_real, metric='cosine')

    exit()

When I investigated the Scale and ZeroPoint on weight and activation tensors. I found that the Scale is increase explonentially and this maybe the cause of my problem. But nothing is strange in the python code of model.

Activations (cosine sorted ):
fake_quant_0                                       cosine: 1.0000, scale: 0.0039, zero_point: 0
fake_quant_1                                       cosine: 1.0000, scale: 0.1549, zero_point: 0
fake_quant_2                                       cosine: 1.0000, scale: 0.1255, zero_point: 0
model_0_conv                                       cosine: 0.9994, scale: 0.7022, zero_point: 129
model_1_conv                                       cosine: 0.9987, scale: 1.0886, zero_point: 130
model_2_cv1_conv                                   cosine: 0.9982, scale: 1.7500, zero_point: 127
model_2_cv2_conv                                   cosine: 0.9970, scale: 1.6294, zero_point: 125
model_2_cv3_conv                                   cosine: 0.9972, scale: 4.3123, zero_point: 130
float_functional_simple_0                          cosine: 0.9964, scale: 4.4792, zero_point: 130
model_2_cv4_conv                                   cosine: 0.9955, scale: 2.7414, zero_point: 130
model_3_cv1_conv                                   cosine: 0.9919, scale: 2.1514, zero_point: 128
model_4_cv1_conv                                   cosine: 0.9905, scale: 3.5070, zero_point: 129
model_4_cv2_0_cv1_conv                             cosine: 0.9860, scale: 1.9446, zero_point: 125
model_4_cv2_0_m_0_cv1_conv1_conv                   cosine: 0.9870, scale: 4.0939, zero_point: 124
model_4_cv2_0_m_0_cv1_conv2_conv                   cosine: 0.9913, scale: 4.0563, zero_point: 121
float_functional_simple_1                          cosine: 0.9867, scale: 5.0703, zero_point: 118
model_4_cv2_0_m_0_cv2_conv                         cosine: 0.9879, scale: 4.3321, zero_point: 120
float_functional_simple_3                          cosine: 0.9868, scale: 4.9689, zero_point: 118
model_4_cv2_0_m_1_cv1_conv1_conv                   cosine: 0.9847, scale: 2.8518, zero_point: 131
model_4_cv2_0_m_1_cv1_conv2_conv                   cosine: 0.9871, scale: 2.0519, zero_point: 118
float_functional_simple_4                          cosine: 0.9858, scale: 4.1911, zero_point: 124
model_4_cv2_0_m_1_cv2_conv                         cosine: 0.9826, scale: 5.7170, zero_point: 126
float_functional_simple_6                          cosine: 0.9852, scale: 6.2659, zero_point: 128
model_4_cv2_0_m_2_cv1_conv1_conv                   cosine: 0.9786, scale: 5.9937, zero_point: 122
model_4_cv2_0_m_2_cv1_conv2_conv                   cosine: 0.9850, scale: 2.1085, zero_point: 126
float_functional_simple_7                          cosine: 0.9795, scale: 6.5579, zero_point: 122
model_4_cv2_0_m_2_cv2_conv                         cosine: 0.9751, scale: 11.9109, zero_point: 124
float_functional_simple_9                          cosine: 0.9772, scale: 13.6928, zero_point: 124
model_4_cv2_0_cv2_conv                             cosine: 0.9851, scale: 1.9301, zero_point: 130
float_functional_simple_10                         cosine: 0.9772, scale: 13.6928, zero_point: 124
model_4_cv2_0_cv3_conv                             cosine: 0.9756, scale: 8.8527, zero_point: 125
model_4_cv2_1_conv                                 cosine: 0.9785, scale: 8.3687, zero_point: 130
model_4_cv3_0_cv1_conv                             cosine: 0.9753, scale: 8.7046, zero_point: 130
model_4_cv3_0_m_0_cv1_conv1_conv                   cosine: 0.9649, scale: 16.3771, zero_point: 128
model_4_cv3_0_m_0_cv1_conv2_conv                   cosine: 0.9746, scale: 9.0218, zero_point: 130
float_functional_simple_11                         cosine: 0.9639, scale: 19.5431, zero_point: 127
model_4_cv3_0_m_0_cv2_conv                         cosine: 0.9597, scale: 25.7564, zero_point: 124
float_functional_simple_13                         cosine: 0.9613, scale: 27.0965, zero_point: 125
model_4_cv3_0_m_1_cv1_conv1_conv                   cosine: 0.9636, scale: 41.2491, zero_point: 129
model_4_cv3_0_m_1_cv1_conv2_conv                   cosine: 0.9611, scale: 14.4820, zero_point: 129
float_functional_simple_14                         cosine: 0.9612, scale: 43.0464, zero_point: 129
model_4_cv3_0_m_1_cv2_conv                         cosine: 0.9560, scale: 64.8167, zero_point: 126
float_functional_simple_16                         cosine: 0.9566, scale: 71.3403, zero_point: 125
model_4_cv3_0_m_2_cv1_conv1_conv                   cosine: 0.9602, scale: 83.9916, zero_point: 137
model_4_cv3_0_m_2_cv1_conv2_conv                   cosine: 0.9611, scale: 49.6557, zero_point: 123
float_functional_simple_17                         cosine: 0.9609, scale: 95.2417, zero_point: 130
model_4_cv3_0_m_2_cv2_conv                         cosine: 0.9699, scale: 231.7959, zero_point: 116
float_functional_simple_19                         cosine: 0.9690, scale: 252.6267, zero_point: 123
model_4_cv3_0_cv2_conv                             cosine: 0.9720, scale: 31.9787, zero_point: 127
float_functional_simple_20                         cosine: 0.9689, scale: 252.6267, zero_point: 123
model_4_cv3_0_cv3_conv                             cosine: 0.9677, scale: 164.9429, zero_point: 131
model_4_cv3_1_conv                                 cosine: 0.9701, scale: 373.9103, zero_point: 132
float_functional_simple_21                         cosine: 0.9696, scale: 373.9414, zero_point: 132
model_4_cv4_conv                                   cosine: 0.9689, scale: 368.2737, zero_point: 132
model_5_cv1_conv                                   cosine: 0.9613, scale: 1116.6826, zero_point: 127
model_6_cv1_conv                                   cosine: 0.9562, scale: 2254.8174, zero_point: 130
model_6_cv2_0_cv1_conv                             cosine: 0.9662, scale: 3353.3918, zero_point: 122
model_6_cv2_0_m_0_cv1_conv1_conv                   cosine: 0.9683, scale: 6312.2114, zero_point: 129
model_6_cv2_0_m_0_cv1_conv2_conv                   cosine: 0.9651, scale: 3265.7844, zero_point: 136
float_functional_simple_22                         cosine: 0.9676, scale: 8566.3838, zero_point: 132
model_6_cv2_0_m_0_cv2_conv                         cosine: 0.9701, scale: 8017.3574, zero_point: 132
float_functional_simple_24                         cosine: 0.9697, scale: 8866.6191, zero_point: 126
model_6_cv2_0_m_1_cv1_conv1_conv                   cosine: 0.9749, scale: 17872.9688, zero_point: 135
model_6_cv2_0_m_1_cv1_conv2_conv                   cosine: 0.9698, scale: 8896.8984, zero_point: 127
float_functional_simple_25                         cosine: 0.9738, scale: 19798.8125, zero_point: 134
model_6_cv2_0_m_1_cv2_conv                         cosine: 0.9745, scale: 28775.4766, zero_point: 124
float_functional_simple_27                         cosine: 0.9746, scale: 32651.2930, zero_point: 123
model_6_cv2_0_m_2_cv1_conv1_conv                   cosine: 0.9725, scale: 34318.4648, zero_point: 122
model_6_cv2_0_m_2_cv1_conv2_conv                   cosine: 0.9725, scale: 19359.3066, zero_point: 134
float_functional_simple_28                         cosine: 0.9725, scale: 45125.1406, zero_point: 130
model_6_cv2_0_m_2_cv2_conv                         cosine: 0.9763, scale: 88597.7656, zero_point: 121
float_functional_simple_30                         cosine: 0.9765, scale: 108406.9766, zero_point: 128
model_6_cv2_0_cv2_conv                             cosine: 0.9538, scale: 5082.6133, zero_point: 139
float_functional_simple_31                         cosine: 0.9763, scale: 108406.9766, zero_point: 128
model_6_cv2_0_cv3_conv                             cosine: 0.9760, scale: 75409.7109, zero_point: 135
model_6_cv2_1_conv                                 cosine: 0.9760, scale: 145796.5000, zero_point: 127
model_6_cv3_0_cv1_conv                             cosine: 0.9769, scale: 467804.5000, zero_point: 140
model_6_cv3_0_m_0_cv1_conv1_conv                   cosine: 0.9776, scale: 883563.1250, zero_point: 130
model_6_cv3_0_m_0_cv1_conv2_conv                   cosine: 0.9806, scale: 236838.5469, zero_point: 143
float_functional_simple_32                         cosine: 0.9778, scale: 960792.6250, zero_point: 131
model_6_cv3_0_m_0_cv2_conv                         cosine: 0.9755, scale: 1600133.1250, zero_point: 125
float_functional_simple_34                         cosine: 0.9758, scale: 1725953.1250, zero_point: 131
model_6_cv3_0_m_1_cv1_conv1_conv                   cosine: 0.9744, scale: 3896023.0000, zero_point: 130
model_6_cv3_0_m_1_cv1_conv2_conv                   cosine: 0.9786, scale: 809567.9375, zero_point: 144
float_functional_simple_35                         cosine: 0.9741, scale: 3896691.2500, zero_point: 130
model_6_cv3_0_m_1_cv2_conv                         cosine: 0.9732, scale: 6870270.5000, zero_point: 129
float_functional_simple_37                         cosine: 0.9727, scale: 6975706.0000, zero_point: 122
model_6_cv3_0_m_2_cv1_conv1_conv                   cosine: 0.9706, scale: 8813739.0000, zero_point: 126
model_6_cv3_0_m_2_cv1_conv2_conv                   cosine: 0.9722, scale: 2160135.5000, zero_point: 151
float_functional_simple_38                         cosine: 0.9704, scale: 9763655.0000, zero_point: 128
model_6_cv3_0_m_2_cv2_conv                         cosine: 0.9716, scale: 25207800.0000, zero_point: 122
float_functional_simple_40                         cosine: 0.9719, scale: 27665160.0000, zero_point: 125
model_6_cv3_0_cv2_conv                             cosine: 0.9774, scale: 502617.4375, zero_point: 125
float_functional_simple_41                         cosine: 0.9717, scale: 27665160.0000, zero_point: 125
model_6_cv3_0_cv3_conv                             cosine: 0.9731, scale: 20631558.0000, zero_point: 148
model_6_cv3_1_conv                                 cosine: 0.9719, scale: 33618380.0000, zero_point: 96
float_functional_simple_42                         cosine: 0.9716, scale: 33618380.0000, zero_point: 96
model_6_cv4_conv                                   cosine: 0.9723, scale: 28748542.0000, zero_point: 139
model_7_cv1_conv                                   cosine: 0.9723, scale: 103265672.0000, zero_point: 114
model_8_cv1_conv                                   cosine: 0.9672, scale: 340222720.0000, zero_point: 152
model_8_cv2_0_cv1_conv                             cosine: 0.9679, scale: 791740480.0000, zero_point: 108
model_8_cv2_0_m_0_cv1_conv1_conv                   cosine: 0.9678, scale: 1985098624.0000, zero_point: 141
model_8_cv2_0_m_0_cv1_conv2_conv                   cosine: 0.9709, scale: 850919936.0000, zero_point: 131
float_functional_simple_43                         cosine: 0.9670, scale: 2112316672.0000, zero_point: 141
model_8_cv2_0_m_0_cv2_conv                         cosine: 0.9633, scale: 3827875328.0000, zero_point: 126
float_functional_simple_45                         cosine: 0.9631, scale: 3868252672.0000, zero_point: 128
model_8_cv2_0_m_1_cv1_conv1_conv                   cosine: 0.9633, scale: 5266277376.0000, zero_point: 131
model_8_cv2_0_m_1_cv1_conv2_conv                   cosine: 0.9540, scale: 2227788800.0000, zero_point: 115
float_functional_simple_46                         cosine: 0.9630, scale: 5369864192.0000, zero_point: 134
model_8_cv2_0_m_1_cv2_conv                         cosine: 0.9634, scale: 10187463680.0000, zero_point: 112
float_functional_simple_48                         cosine: 0.9619, scale: 11089908736.0000, zero_point: 106
model_8_cv2_0_m_2_cv1_conv1_conv                   cosine: 0.9578, scale: 8756760576.0000, zero_point: 125
model_8_cv2_0_m_2_cv1_conv2_conv                   cosine: 0.9543, scale: 3600450816.0000, zero_point: 122
float_functional_simple_49                         cosine: 0.9569, scale: 9267959808.0000, zero_point: 123
model_8_cv2_0_m_2_cv2_conv                         cosine: 0.9570, scale: 17609924608.0000, zero_point: 132
float_functional_simple_51                         cosine: 0.9584, scale: 19000872960.0000, zero_point: 135
model_8_cv2_0_cv2_conv                             cosine: 0.9658, scale: 1388501632.0000, zero_point: 120
float_functional_simple_52                         cosine: 0.9582, scale: 19000872960.0000, zero_point: 135
model_8_cv2_0_cv3_conv                             cosine: 0.9588, scale: 12136737792.0000, zero_point: 118
model_8_cv2_1_conv                                 cosine: 0.9591, scale: 29498615808.0000, zero_point: 119
model_8_cv3_0_cv1_conv                             cosine: 0.9490, scale: 59946033152.0000, zero_point: 100
model_8_cv3_0_m_0_cv1_conv1_conv                   cosine: 0.9387, scale: 123696021504.0000, zero_point: 139
model_8_cv3_0_m_0_cv1_conv2_conv                   cosine: 0.9666, scale: 86454132736.0000, zero_point: 101
float_functional_simple_53                         cosine: 0.9425, scale: 136337670144.0000, zero_point: 128
model_8_cv3_0_m_0_cv2_conv                         cosine: 0.9330, scale: 176887267328.0000, zero_point: 138
float_functional_simple_55                         cosine: 0.9326, scale: 180858929152.0000, zero_point: 135
model_8_cv3_0_m_1_cv1_conv1_conv                   cosine: 0.9326, scale: 476538175488.0000, zero_point: 132
model_8_cv3_0_m_1_cv1_conv2_conv                   cosine: 0.9392, scale: 182416998400.0000, zero_point: 117
float_functional_simple_56                         cosine: 0.9325, scale: 477301833728.0000, zero_point: 132
model_8_cv3_0_m_1_cv2_conv                         cosine: 0.9312, scale: 745177481216.0000, zero_point: 136
float_functional_simple_58                         cosine: 0.9316, scale: 751796289536.0000, zero_point: 139
model_8_cv3_0_m_2_cv1_conv1_conv                   cosine: 0.9398, scale: 876790808576.0000, zero_point: 129
model_8_cv3_0_m_2_cv1_conv2_conv                   cosine: 0.9191, scale: 178498486272.0000, zero_point: 120
float_functional_simple_59                         cosine: 0.9391, scale: 878096744448.0000, zero_point: 129
model_8_cv3_0_m_2_cv2_conv                         cosine: 0.9334, scale: 2294947512320.0000, zero_point: 104
float_functional_simple_61                         cosine: 0.9332, scale: 2228726005760.0000, zero_point: 109
model_8_cv3_0_cv2_conv                             cosine: 0.9548, scale: 128897351680.0000, zero_point: 101
float_functional_simple_62                         cosine: 0.9331, scale: 2228726005760.0000, zero_point: 109
model_8_cv3_0_cv3_conv                             cosine: 0.9319, scale: 1641014755328.0000, zero_point: 131
model_8_cv3_1_conv                                 cosine: 0.9280, scale: 3021996294144.0000, zero_point: 120
float_functional_simple_63                         cosine: 0.9279, scale: 3021996294144.0000, zero_point: 120
model_8_cv4_conv                                   cosine: 0.9228, scale: 3555324592128.0000, zero_point: 128
model_9_cv1_conv                                   cosine: 0.9256, scale: 7513966641152.0000, zero_point: 118
float_functional_simple_64                         cosine: 0.9704, scale: 7513966641152.0000, zero_point: 118
model_9_cv5_conv                                   cosine: 0.9605, scale: 15366189219840.0000, zero_point: 177
float_functional_simple_65                         cosine: 0.9599, scale: 15366189219840.0000, zero_point: 177
model_12_cv1_conv                                  cosine: 0.9569, scale: 16234717380608.0000, zero_point: 141
model_12_cv2_0_cv1_conv                            cosine: 0.9507, scale: 22964714078208.0000, zero_point: 134
model_12_cv2_0_m_0_cv1_conv1_conv                  cosine: 0.9494, scale: 67497839558656.0000, zero_point: 145
model_12_cv2_0_m_0_cv1_conv2_conv                  cosine: 0.9548, scale: 27334409191424.0000, zero_point: 129
float_functional_simple_66                         cosine: 0.9448, scale: 74325692514304.0000, zero_point: 144
model_12_cv2_0_m_0_cv2_conv                        cosine: 0.9354, scale: 97196594692096.0000, zero_point: 123
float_functional_simple_68                         cosine: 0.9324, scale: 105329350148096.0000, zero_point: 131
model_12_cv2_0_m_1_cv1_conv1_conv                  cosine: 0.9322, scale: 229926024249344.0000, zero_point: 137
model_12_cv2_0_m_1_cv1_conv2_conv                  cosine: 0.9314, scale: 78935106781184.0000, zero_point: 134
float_functional_simple_69                         cosine: 0.9321, scale: 265273605095424.0000, zero_point: 135
model_12_cv2_0_m_1_cv2_conv                        cosine: 0.9354, scale: 447237058461696.0000, zero_point: 122
float_functional_simple_71                         cosine: 0.9344, scale: 499950131609600.0000, zero_point: 120
model_12_cv2_0_m_2_cv1_conv1_conv                  cosine: 0.9388, scale: 649142565076992.0000, zero_point: 141
model_12_cv2_0_m_2_cv1_conv2_conv                  cosine: 0.9428, scale: 267821678329856.0000, zero_point: 139
float_functional_simple_72                         cosine: 0.9399, scale: 665993533718528.0000, zero_point: 142
model_12_cv2_0_m_2_cv2_conv                        cosine: 0.9440, scale: 1906159233531904.0000, zero_point: 119
float_functional_simple_74                         cosine: 0.9428, scale: 2060878585266176.0000, zero_point: 120
model_12_cv2_0_cv2_conv                            cosine: 0.9490, scale: 49131313692672.0000, zero_point: 126
float_functional_simple_75                         cosine: 0.9426, scale: 2060878585266176.0000, zero_point: 120
model_12_cv2_0_cv3_conv                            cosine: 0.9393, scale: 1193777301553152.0000, zero_point: 114
model_12_cv2_1_conv                                cosine: 0.9100, scale: 2681612491816960.0000, zero_point: 132
model_12_cv3_0_cv1_conv                            cosine: 0.6738, scale: 4518242781495296.0000, zero_point: 134
model_12_cv3_0_m_0_cv1_conv1_conv                  cosine: 0.0000, scale: 11420348104835072.0000, zero_point: 123
model_12_cv3_0_m_0_cv1_conv2_conv                  cosine: 0.9385, scale: 5521993281568768.0000, zero_point: 119
float_functional_simple_76                         cosine: 0.0000, scale: 12662761884483584.0000, zero_point: 121
model_12_cv3_0_m_0_cv2_conv                        cosine: 0.0000, scale: 17234315410669568.0000, zero_point: 139
float_functional_simple_78                         cosine: 0.0000, scale: 17678376374370304.0000, zero_point: 141
model_12_cv3_0_m_1_cv1_conv1_conv                  cosine: 0.0000, scale: 32197411313025024.0000, zero_point: 115
model_12_cv3_0_m_1_cv1_conv2_conv                  cosine: 0.1404, scale: 17622455900176384.0000, zero_point: 134
float_functional_simple_79                         cosine: 0.0000, scale: 37361104464117760.0000, zero_point: 110
model_12_cv3_0_m_1_cv2_conv                        cosine: 0.0000, scale: 95018918699073536.0000, zero_point: 109
float_functional_simple_81                         cosine: 0.0000, scale: 85406017386446848.0000, zero_point: 110
model_12_cv3_0_m_2_cv1_conv1_conv                  cosine: 0.0000, scale: 98502102816391168.0000, zero_point: 126
model_12_cv3_0_m_2_cv1_conv2_conv                  cosine: 0.0000, scale: 43889368854691840.0000, zero_point: 112
float_functional_simple_82                         cosine: 0.0000, scale: 123449729592852480.0000, zero_point: 111
model_12_cv3_0_m_2_cv2_conv                        cosine: 0.0000, scale: 300629259540496384.0000, zero_point: 145
float_functional_simple_84                         cosine: 0.0000, scale: 265760619087527936.0000, zero_point: 133
model_12_cv3_0_cv2_conv                            cosine: 0.1414, scale: 6869318079873024.0000, zero_point: 129
float_functional_simple_85                         cosine: 0.0000, scale: 265760619087527936.0000, zero_point: 133
model_12_cv3_0_cv3_conv                            cosine: 0.0000, scale: 111920147584778240.0000, zero_point: 123
model_12_cv3_1_conv                                cosine: 0.0000, scale: 238228813568278528.0000, zero_point: 147
float_functional_simple_86                         cosine: 0.0000, scale: 238228813568278528.0000, zero_point: 147
model_12_cv4_conv                                  cosine: 0.0000, scale: 234854652900802560.0000, zero_point: 128
float_functional_simple_87                         cosine: 0.0000, scale: 234854652900802560.0000, zero_point: 128
model_15_cv1_conv                                  cosine: 0.0000, scale: 395619779915808768.0000, zero_point: 144
model_15_cv2_0_cv1_conv                            cosine: 0.0000, scale: 168402523760099328.0000, zero_point: 129
model_15_cv2_0_m_0_cv1_conv1_conv                  cosine: 0.0000, scale: 488027203880091648.0000, zero_point: 114
model_15_cv2_0_m_0_cv1_conv2_conv                  cosine: 0.0000, scale: 209578924982665216.0000, zero_point: 140
float_functional_simple_88                         cosine: 0.0000, scale: 632415242638327808.0000, zero_point: 120
model_15_cv2_0_m_0_cv2_conv                        cosine: 0.0000, scale: 545815954488033280.0000, zero_point: 129
float_functional_simple_90                         cosine: 0.0000, scale: 564939416632557568.0000, zero_point: 136
model_15_cv2_0_m_1_cv1_conv1_conv                  cosine: 0.0000, scale: 795782948036018176.0000, zero_point: 118
model_15_cv2_0_m_1_cv1_conv2_conv                  cosine: 0.0000, scale: 479512104798257152.0000, zero_point: 116
float_functional_simple_91                         cosine: 0.0000, scale: 984001196769411072.0000, zero_point: 116
model_15_cv2_0_m_1_cv2_conv                        cosine: 0.0000, scale: 1920143538599755776.0000, zero_point: 133
float_functional_simple_93                         cosine: 0.0000, scale: 2143884671564382208.0000, zero_point: 140
model_15_cv2_0_m_2_cv1_conv1_conv                  cosine: 0.0000, scale: 1707738370517499904.0000, zero_point: 125
model_15_cv2_0_m_2_cv1_conv2_conv                  cosine: 0.0000, scale: 767321127478362112.0000, zero_point: 135
float_functional_simple_94                         cosine: 0.0000, scale: 2026154601459220480.0000, zero_point: 123
model_15_cv2_0_m_2_cv2_conv                        cosine: 0.0000, scale: 5288267200044466176.0000, zero_point: 130
float_functional_simple_96                         cosine: 0.0000, scale: 5733578755142582272.0000, zero_point: 129
model_15_cv2_0_cv2_conv                            cosine: 0.0000, scale: 995193056909066240.0000, zero_point: 108
float_functional_simple_97                         cosine: 0.0000, scale: 5733578755142582272.0000, zero_point: 129
model_15_cv2_0_cv3_conv                            cosine: 0.0000, scale: 2860377575513915392.0000, zero_point: 128
model_15_cv2_1_conv                                cosine: 0.0000, scale: 6149354129383751680.0000, zero_point: 138
model_15_cv3_0_cv1_conv                            cosine: 0.0000, scale: 6916664261745836032.0000, zero_point: 130
model_15_cv3_0_m_0_cv1_conv1_conv                  cosine: 0.0000, scale: 17339241195422875648.0000, zero_point: 138
model_15_cv3_0_m_0_cv1_conv2_conv                  cosine: 0.0000, scale: 7385824224050413568.0000, zero_point: 139
float_functional_simple_98                         cosine: 0.0000, scale: 21068617511071645696.0000, zero_point: 130
model_15_cv3_0_m_0_cv2_conv                        cosine: 0.0000, scale: 54672155761952423936.0000, zero_point: 114
float_functional_simple_100                        cosine: 0.0000, scale: 55267101503742017536.0000, zero_point: 115
model_15_cv3_0_m_1_cv1_conv1_conv                  cosine: 0.0000, scale: 70106198393137135616.0000, zero_point: 130
model_15_cv3_0_m_1_cv1_conv2_conv                  cosine: 0.0000, scale: 13230821440237862912.0000, zero_point: 134
float_functional_simple_101                        cosine: 0.0000, scale: 69890483007860506624.0000, zero_point: 130
model_15_cv3_0_m_1_cv2_conv                        cosine: 0.0000, scale: 174510597687490904064.0000, zero_point: 131
float_functional_simple_103                        cosine: 0.0000, scale: 189863509554685280256.0000, zero_point: 132
model_15_cv3_0_m_2_cv1_conv1_conv                  cosine: 0.0000, scale: 211542747445312094208.0000, zero_point: 146
model_15_cv3_0_m_2_cv1_conv2_conv                  cosine: 0.0000, scale: 46494168594461491200.0000, zero_point: 112
float_functional_simple_104                        cosine: 0.0000, scale: 212031141714277171200.0000, zero_point: 146
model_15_cv3_0_m_2_cv2_conv                        cosine: 0.0000, scale: 459744549566815404032.0000, zero_point: 132
float_functional_simple_106                        cosine: 0.0000, scale: 549126083996354084864.0000, zero_point: 140
model_15_cv3_0_cv2_conv                            cosine: 0.0000, scale: 11977203362224406528.0000, zero_point: 121
float_functional_simple_107                        cosine: 0.0000, scale: 549126083996354084864.0000, zero_point: 140
model_15_cv3_0_cv3_conv                            cosine: 0.0000, scale: 328729980963259416576.0000, zero_point: 130
model_15_cv3_1_conv                                cosine: 0.0000, scale: 1340926522851242016768.0000, zero_point: 136
float_functional_simple_108                        cosine: 0.0000, scale: 1340926522851242016768.0000, zero_point: 136
model_15_cv4_conv                                  cosine: 0.0000, scale: 1510444264625281171456.0000, zero_point: 116
model_16_cv1_conv                                  cosine: 0.0000, scale: 2727966528187037384704.0000, zero_point: 130
float_functional_simple_109                        cosine: 0.0000, scale: 2727966528187037384704.0000, zero_point: 130
model_18_cv1_conv                                  cosine: 0.0000, scale: 4825513551035218526208.0000, zero_point: 130
model_18_cv2_0_cv1_conv                            cosine: 0.0000, scale: 8198244675274194026496.0000, zero_point: 134
model_18_cv2_0_m_0_cv1_conv1_conv                  cosine: 0.0000, scale: 18361575375256441323520.0000, zero_point: 127
model_18_cv2_0_m_0_cv1_conv2_conv                  cosine: 0.0000, scale: 4737185577443507830784.0000, zero_point: 146
float_functional_simple_110                        cosine: 0.0000, scale: 19509499010176813891584.0000, zero_point: 130
model_18_cv2_0_m_0_cv2_conv                        cosine: 0.0000, scale: 26159701119336624160768.0000, zero_point: 122
float_functional_simple_112                        cosine: 0.0000, scale: 27926533058948293984256.0000, zero_point: 121
model_18_cv2_0_m_1_cv1_conv1_conv                  cosine: 0.0000, scale: 58713833665995619696640.0000, zero_point: 125
model_18_cv2_0_m_1_cv1_conv2_conv                  cosine: 0.0000, scale: 10198719985854124654592.0000, zero_point: 119
float_functional_simple_113                        cosine: 0.0000, scale: 64700603758648041144320.0000, zero_point: 118
model_18_cv2_0_m_1_cv2_conv                        cosine: 0.0000, scale: 140854437830451557040128.0000, zero_point: 121
float_functional_simple_115                        cosine: 0.0000, scale: 163513632822821356830720.0000, zero_point: 122
model_18_cv2_0_m_2_cv1_conv1_conv                  cosine: 0.0000, scale: 188537145561966629093376.0000, zero_point: 134
model_18_cv2_0_m_2_cv1_conv2_conv                  cosine: 0.0000, scale: 35542644698188391383040.0000, zero_point: 113
float_functional_simple_116                        cosine: 0.0000, scale: 197277119229217965342720.0000, zero_point: 132
model_18_cv2_0_m_2_cv2_conv                        cosine: 0.0000, scale: 830950160046875475968000.0000, zero_point: 133
float_functional_simple_118                        cosine: 0.0000, scale: 907493483428852501839872.0000, zero_point: 133
model_18_cv2_0_cv2_conv                            cosine: 0.0000, scale: 7523674693938364547072.0000, zero_point: 127
float_functional_simple_119                        cosine: 0.0000, scale: 907493483428852501839872.0000, zero_point: 133
model_18_cv2_0_cv3_conv                            cosine: 0.0000, scale: 805862804279412599554048.0000, zero_point: 129
model_18_cv2_1_conv                                cosine: 0.0000, scale: 2121104776038203902656512.0000, zero_point: 133
model_18_cv3_0_cv1_conv                            cosine: 0.0000, scale: 2817972182711936137297920.0000, zero_point: 133
model_18_cv3_0_m_0_cv1_conv1_conv                  cosine: 0.0000, scale: 7855385131239319900520448.0000, zero_point: 128
model_18_cv3_0_m_0_cv1_conv2_conv                  cosine: 0.0000, scale: 765288037767783833927680.0000, zero_point: 112
float_functional_simple_120                        cosine: 0.0000, scale: 7854206845461611702910976.0000, zero_point: 128
model_18_cv3_0_m_0_cv2_conv                        cosine: 0.0000, scale: 13591248591781449175138304.0000, zero_point: 111
float_functional_simple_122                        cosine: 0.0000, scale: 14230574694966570208198656.0000, zero_point: 111
model_18_cv3_0_m_1_cv1_conv1_conv                  cosine: 0.0000, scale: 21691932184644683355389952.0000, zero_point: 130
model_18_cv3_0_m_1_cv1_conv2_conv                  cosine: 0.0000, scale: 3506360860525601543421952.0000, zero_point: 113
float_functional_simple_123                        cosine: 0.0000, scale: 21704711166601745647271936.0000, zero_point: 130
model_18_cv3_0_m_1_cv2_conv                        cosine: 0.0000, scale: 67318634708763989044625408.0000, zero_point: 112
float_functional_simple_125                        cosine: 0.0000, scale: 75585561512140998310887424.0000, zero_point: 112
model_18_cv3_0_m_2_cv1_conv1_conv                  cosine: 0.0000, scale: 71533540485537921348141056.0000, zero_point: 133
model_18_cv3_0_m_2_cv1_conv2_conv                  cosine: 0.0000, scale: 16717391792758400930021376.0000, zero_point: 146
float_functional_simple_126                        cosine: 0.0000, scale: 71230501985581039261581312.0000, zero_point: 133
model_18_cv3_0_m_2_cv2_conv                        cosine: 0.0000, scale: 212010495674480132192468992.0000, zero_point: 113
float_functional_simple_128                        cosine: 0.0000, scale: 211675226100940461091848192.0000, zero_point: 116
model_18_cv3_0_cv2_conv                            cosine: 0.0000, scale: 6374307578776226069741568.0000, zero_point: 122
float_functional_simple_129                        cosine: 0.0000, scale: 211675226100940461091848192.0000, zero_point: 116
model_18_cv3_0_cv3_conv                            cosine: 0.0000, scale: 124201900178170300466200576.0000, zero_point: 127
model_18_cv3_1_conv                                cosine: 0.0000, scale: 470580462710538730406412288.0000, zero_point: 148
float_functional_simple_130                        cosine: 0.0000, scale: 470580462710538730406412288.0000, zero_point: 148
model_18_cv4_conv                                  cosine: 0.0000, scale: 834629207507973724113469440.0000, zero_point: 131
model_19_cv1_conv                                  cosine: 0.0000, scale: 3658705673782474374761152512.0000, zero_point: 105
float_functional_simple_131                        cosine: 0.0000, scale: 3658705673782474374761152512.0000, zero_point: 105
model_21_cv1_conv                                  cosine: 0.0000, scale: 6476222108929482434663153664.0000, zero_point: 122
model_21_cv2_0_cv1_conv                            cosine: 0.0000, scale: 6372631097169633179844214784.0000, zero_point: 144
model_21_cv2_0_m_0_cv1_conv1_conv                  cosine: 0.0000, scale: 16483064850378660677603557376.0000, zero_point: 116
model_21_cv2_0_m_0_cv1_conv2_conv                  cosine: 0.0000, scale: 9090745554775002539584651264.0000, zero_point: 144
float_functional_simple_132                        cosine: 0.0000, scale: 16809818373836239321463521280.0000, zero_point: 115
model_21_cv2_0_m_0_cv2_conv                        cosine: 0.0000, scale: 35104851701917120715328323584.0000, zero_point: 119
float_functional_simple_134                        cosine: 0.0000, scale: 34313959208099078274228944896.0000, zero_point: 119
model_21_cv2_0_m_1_cv1_conv1_conv                  cosine: 0.0000, scale: 42973676665108257533990010880.0000, zero_point: 137
model_21_cv2_0_m_1_cv1_conv2_conv                  cosine: 0.0000, scale: 20804690132920367081054011392.0000, zero_point: 149
float_functional_simple_135                        cosine: 0.0000, scale: 43112174229317858488817287168.0000, zero_point: 137
model_21_cv2_0_m_1_cv2_conv                        cosine: 0.0000, scale: 122621950346421644505034784768.0000, zero_point: 139
float_functional_simple_137                        cosine: 0.0000, scale: 121067224518732402689910505472.0000, zero_point: 138
model_21_cv2_0_m_2_cv1_conv1_conv                  cosine: 0.0000, scale: 157129151152182990096230776832.0000, zero_point: 117
model_21_cv2_0_m_2_cv1_conv2_conv                  cosine: 0.0000, scale: 22839742022681729119798951936.0000, zero_point: 140
float_functional_simple_138                        cosine: 0.0000, scale: 157688147117493235739466399744.0000, zero_point: 117
model_21_cv2_0_m_2_cv2_conv                        cosine: 0.0000, scale: 434921263917965844034208399360.0000, zero_point: 124
float_functional_simple_140                        cosine: 0.0000, scale: 458745640602975067094466428928.0000, zero_point: 117
model_21_cv2_0_cv2_conv                            cosine: 0.0000, scale: 13776840721281346725805555712.0000, zero_point: 125
float_functional_simple_141                        cosine: 0.0000, scale: 458745640602975067094466428928.0000, zero_point: 117
model_21_cv2_0_cv3_conv                            cosine: 0.0000, scale: 376607404746239002291596165120.0000, zero_point: 123
model_21_cv2_1_conv                                cosine: 0.0000, scale: 561087066967792185093409210368.0000, zero_point: 127
model_21_cv3_0_cv1_conv                            cosine: 0.0000, scale: 929801243922039821674701914112.0000, zero_point: 138
model_21_cv3_0_m_0_cv1_conv1_conv                  cosine: 0.0000, scale: 3010543842793895099805913317376.0000, zero_point: 115
model_21_cv3_0_m_0_cv1_conv2_conv                  cosine: 0.0000, scale: 1009695751236502997686479224832.0000, zero_point: 156
float_functional_simple_142                        cosine: 0.0000, scale: 3010639650165099559168008781824.0000, zero_point: 115
model_21_cv3_0_m_0_cv2_conv                        cosine: 0.0000, scale: 4663780124262390861945976651776.0000, zero_point: 125
float_functional_simple_144                        cosine: 0.0000, scale: 4723553345793051875888267788288.0000, zero_point: 124
model_21_cv3_0_m_1_cv1_conv1_conv                  cosine: 0.0000, scale: 8711323610160240263006401855488.0000, zero_point: 128
model_21_cv3_0_m_1_cv1_conv2_conv                  cosine: 0.0000, scale: 2791749123414035571537359142912.0000, zero_point: 120
float_functional_simple_145                        cosine: 0.0000, scale: 8716741411220843223652847583232.0000, zero_point: 128
model_21_cv3_0_m_1_cv2_conv                        cosine: 0.0000, scale: 13781262310798189437576300986368.0000, zero_point: 130
float_functional_simple_147                        cosine: 0.0000, scale: 14034192561852142538879152422912.0000, zero_point: 126
model_21_cv3_0_m_2_cv1_conv1_conv                  cosine: 0.0000, scale: 18769503013472401587148731449344.0000, zero_point: 113
model_21_cv3_0_m_2_cv1_conv2_conv                  cosine: 0.0000, scale: 7678576968089690227188506296320.0000, zero_point: 154
float_functional_simple_148                        cosine: 0.0000, scale: 19104402061873685390384185737216.0000, zero_point: 116
model_21_cv3_0_m_2_cv2_conv                        cosine: 0.0000, scale: 56875815615158663721114688028672.0000, zero_point: 120
float_functional_simple_150                        cosine: 0.0000, scale: 57759375975385490058257142644736.0000, zero_point: 116
model_21_cv3_0_cv2_conv                            cosine: 0.0000, scale: 1373366478565977716136706310144.0000, zero_point: 125
float_functional_simple_151                        cosine: 0.0000, scale: 57759375975385490058257142644736.0000, zero_point: 116
model_21_cv3_0_cv3_conv                            cosine: 0.0000, scale: 32239362958099337154129167384576.0000, zero_point: 146
model_21_cv3_1_conv                                cosine: 0.0000, scale: 157657176704730620717440130088960.0000, zero_point: 97
float_functional_simple_152                        cosine: 0.0000, scale: 157657176704730620717440130088960.0000, zero_point: 97
model_21_cv4_conv                                  cosine: 0.0000, scale: 313980569731196813359010430320640.0000, zero_point: 136
model_22_cv2_0_0_conv                              cosine: 0.0000, scale: 7438893868003161538560.0000, zero_point: 130
model_22_cv2_0_1_conv                              cosine: 0.0000, scale: 21539606880404935540736.0000, zero_point: 129
model_22_cv2_0_2                                   cosine: 0.0000, scale: 25610327266992020520960.0000, zero_point: 116
model_22_cv3_0_0_conv                              cosine: 0.0000, scale: 2717120171434468966400.0000, zero_point: 139
model_22_cv3_0_1_conv                              cosine: 0.0000, scale: 66322313955667137789952.0000, zero_point: 138
model_22_cv3_0_2                                   cosine: 0.0000, scale: 27785626685606939590656.0000, zero_point: 118
float_functional_simple_153                        cosine: 0.0000, scale: 27884413143433311420416.0000, zero_point: 118
model_22_cv2_1_0_conv                              cosine: 0.0000, scale: 2077073425187319807649775616.0000, zero_point: 128
model_22_cv2_1_1_conv                              cosine: 0.0000, scale: 4518071301010505954047819776.0000, zero_point: 112
model_22_cv2_1_2                                   cosine: 0.0000, scale: 4882846958204909546345857024.0000, zero_point: 135
model_22_cv3_1_0_conv                              cosine: 0.0000, scale: 1926441887873937889122320384.0000, zero_point: 142
model_22_cv3_1_1_conv                              cosine: 0.0000, scale: 28373186754900443073060274176.0000, zero_point: 143
model_22_cv3_1_2                                   cosine: 0.0000, scale: 12585601344736685089835974656.0000, zero_point: 130
float_functional_simple_154                        cosine: 0.0000, scale: 12585601344736685089835974656.0000, zero_point: 130
model_22_cv2_2_0_conv                              cosine: 0.0000, scale: 884143816841592027933268009549824.0000, zero_point: 98
model_22_cv2_2_1_conv                              cosine: 0.0000, scale: 1703424836869797092327990231040000.0000, zero_point: 127
model_22_cv2_2_2                                   cosine: 0.0000, scale: 2586962449948401560607893575696384.0000, zero_point: 96
model_22_cv3_2_0_conv                              cosine: 0.0000, scale: 644344579541063536629906138988544.0000, zero_point: 129
model_22_cv3_2_1_conv                              cosine: 0.0000, scale: 5617934068422202072308237513785344.0000, zero_point: 0
model_22_cv3_2_2                                   cosine: 0.0000, scale: 3358086906107133234282635234115584.0000, zero_point: 253
float_functional_simple_155                        cosine: 0.0000, scale: 4945869109979571796213320688599040.0000, zero_point: 172
float_functional_simple_156                        cosine: 0.0000, scale: 4945869109979571796213320688599040.0000, zero_point: 172
fake_quant_inner_0_0_0                             cosine: 0.2545, scale: 0.0039, zero_point: 0
model_22_dfl_conv                                  cosine: 0.7398, scale: 0.0587, zero_point: 0
float_functional_simple_158                        cosine: 0.9640, scale: 0.2002, zero_point: 58
float_functional_simple_159                        cosine: 0.9805, scale: 0.2008, zero_point: 0
float_functional_simple_160                        cosine: 0.9955, scale: 0.3932, zero_point: 27
float_functional_simple_163                        cosine: 0.7699, scale: 0.1114, zero_point: 1
float_functional_simple_164                        cosine: 0.9226, scale: 0.1966, zero_point: 27
float_functional_simple_165                        cosine: 0.9107, scale: 4.0818, zero_point: 42
rewritten_sigmoid_0                                cosine: 0.6795, scale: 0.0039, zero_point: 0
float_functional_simple_166                        cosine: 0.9107, scale: 4.0818, zero_point: 42

WARNING (tinynn.util.quantization_analysis_util) Quantization error report:

Weights (cosine sorted 30):
model_21_cv3_0_m_1_cv1_conv2_conv        cosine: 0.3166, scale: 0.0060, zero_point: 0
model_18_cv3_0_m_1_cv1_conv2_conv        cosine: 0.3250, scale: 0.0090, zero_point: 0
model_8_cv3_0_m_0_cv1_conv2_conv         cosine: 0.3908, scale: 0.0090, zero_point: 0
model_21_cv3_0_m_0_cv1_conv2_conv        cosine: 0.4174, scale: 0.0064, zero_point: 0
model_21_cv3_0_m_2_cv1_conv2_conv        cosine: 0.4506, scale: 0.0032, zero_point: 0
model_8_cv3_0_m_1_cv1_conv2_conv         cosine: 0.4590, scale: 0.0057, zero_point: 0
model_21_cv2_0_m_2_cv1_conv2_conv        cosine: 0.5192, scale: 0.0031, zero_point: 0
model_8_cv2_0_m_1_cv1_conv2_conv         cosine: 0.5233, scale: 0.0035, zero_point: 0
model_21_cv2_0_m_1_cv1_conv2_conv        cosine: 0.5587, scale: 0.0023, zero_point: 0
model_8_cv3_0_m_2_cv1_conv2_conv         cosine: 0.6255, scale: 0.0033, zero_point: 0
model_18_cv3_0_m_2_cv1_conv2_conv        cosine: 0.6618, scale: 0.0025, zero_point: 0
model_12_cv3_0_m_2_cv1_conv2_conv        cosine: 0.6693, scale: 0.0072, zero_point: 0
model_22_cv3_1_1_conv                    cosine: 0.7042, scale: 0.1054, zero_point: 0
model_21_cv2_0_m_0_cv1_conv2_conv        cosine: 0.7155, scale: 0.0080, zero_point: 0
model_6_cv3_0_m_1_cv1_conv2_conv         cosine: 0.7166, scale: 0.0043, zero_point: 0
model_8_cv2_0_m_2_cv1_conv2_conv         cosine: 0.7418, scale: 0.0020, zero_point: 0
model_12_cv3_0_m_1_cv1_conv2_conv        cosine: 0.7678, scale: 0.0078, zero_point: 0
model_12_cv3_0_m_0_cv1_conv2_conv        cosine: 0.7684, scale: 0.0065, zero_point: 0
model_18_cv3_0_m_0_cv1_conv2_conv        cosine: 0.7739, scale: 0.0050, zero_point: 0
model_22_cv3_2_1_conv                    cosine: 0.7816, scale: 0.0768, zero_point: 0
model_6_cv3_0_m_2_cv1_conv2_conv         cosine: 0.7965, scale: 0.0029, zero_point: 0
model_8_cv2_0_m_0_cv1_conv2_conv         cosine: 0.8355, scale: 0.0056, zero_point: 0
model_22_cv3_0_1_conv                    cosine: 0.8525, scale: 0.0991, zero_point: 0
model_6_cv2_0_m_0_cv1_conv2_conv         cosine: 0.8745, scale: 0.0060, zero_point: 0
model_18_cv2_0_m_0_cv1_conv2_conv        cosine: 0.8834, scale: 0.0038, zero_point: 0
model_18_cv2_0_m_1_cv1_conv2_conv        cosine: 0.9018, scale: 0.0021, zero_point: 0
model_6_cv3_0_m_0_cv1_conv2_conv         cosine: 0.9022, scale: 0.0034, zero_point: 0
model_6_cv2_0_m_1_cv1_conv2_conv         cosine: 0.9155, scale: 0.0044, zero_point: 0
model_15_cv3_0_m_2_cv1_conv2_conv        cosine: 0.9271, scale: 0.0020, zero_point: 0
model_18_cv2_0_m_2_cv1_conv2_conv        cosine: 0.9313, scale: 0.0015, zero_point: 0

Activations (cosine sorted 30):
float_functional_simple_156                        cosine: 0.0000, scale: 4945869109979571796213320688599040.0000, zero_point: 172
float_functional_simple_155                        cosine: 0.0000, scale: 4945869109979571796213320688599040.0000, zero_point: 172
model_22_cv3_2_2                                   cosine: 0.0000, scale: 3358086906107133234282635234115584.0000, zero_point: 253
model_22_cv3_2_1_conv                              cosine: 0.0000, scale: 5617934068422202072308237513785344.0000, zero_point: 0
model_22_cv3_2_0_conv                              cosine: 0.0000, scale: 644344579541063536629906138988544.0000, zero_point: 129
model_22_cv2_2_2                                   cosine: 0.0000, scale: 2586962449948401560607893575696384.0000, zero_point: 96
model_22_cv2_2_1_conv                              cosine: 0.0000, scale: 1703424836869797092327990231040000.0000, zero_point: 127
model_22_cv2_2_0_conv                              cosine: 0.0000, scale: 884143816841592027933268009549824.0000, zero_point: 98
float_functional_simple_154                        cosine: 0.0000, scale: 12585601344736685089835974656.0000, zero_point: 130
model_22_cv3_1_2                                   cosine: 0.0000, scale: 12585601344736685089835974656.0000, zero_point: 130
model_22_cv3_1_1_conv                              cosine: 0.0000, scale: 28373186754900443073060274176.0000, zero_point: 143
model_22_cv3_1_0_conv                              cosine: 0.0000, scale: 1926441887873937889122320384.0000, zero_point: 142
model_22_cv2_1_2                                   cosine: 0.0000, scale: 4882846958204909546345857024.0000, zero_point: 135
model_22_cv2_1_1_conv                              cosine: 0.0000, scale: 4518071301010505954047819776.0000, zero_point: 112
model_22_cv2_1_0_conv                              cosine: 0.0000, scale: 2077073425187319807649775616.0000, zero_point: 128
float_functional_simple_153                        cosine: 0.0000, scale: 27884413143433311420416.0000, zero_point: 118
model_22_cv3_0_2                                   cosine: 0.0000, scale: 27785626685606939590656.0000, zero_point: 118
model_22_cv3_0_1_conv                              cosine: 0.0000, scale: 66322313955667137789952.0000, zero_point: 138
model_22_cv3_0_0_conv                              cosine: 0.0000, scale: 2717120171434468966400.0000, zero_point: 139
model_22_cv2_0_2                                   cosine: 0.0000, scale: 25610327266992020520960.0000, zero_point: 116
model_22_cv2_0_1_conv                              cosine: 0.0000, scale: 21539606880404935540736.0000, zero_point: 129
model_22_cv2_0_0_conv                              cosine: 0.0000, scale: 7438893868003161538560.0000, zero_point: 130
model_21_cv4_conv                                  cosine: 0.0000, scale: 313980569731196813359010430320640.0000, zero_point: 136
float_functional_simple_152                        cosine: 0.0000, scale: 157657176704730620717440130088960.0000, zero_point: 97
model_21_cv3_1_conv                                cosine: 0.0000, scale: 157657176704730620717440130088960.0000, zero_point: 97
model_21_cv3_0_cv3_conv                            cosine: 0.0000, scale: 32239362958099337154129167384576.0000, zero_point: 146
float_functional_simple_151                        cosine: 0.0000, scale: 57759375975385490058257142644736.0000, zero_point: 116
model_21_cv3_0_cv2_conv                            cosine: 0.0000, scale: 1373366478565977716136706310144.0000, zero_point: 125
float_functional_simple_150                        cosine: 0.0000, scale: 57759375975385490058257142644736.0000, zero_point: 116
model_21_cv3_0_m_2_cv2_conv                        cosine: 0.0000, scale: 56875815615158663721114688028672.0000, zero_point: 120
hoangtv2000 commented 1 month ago

Here are my full code, which is modified from YOLOv9 implementation

import argparse
import math
import os
import random
import sys
import time
from copy import deepcopy
from datetime import datetime
from pathlib import Path

import numpy as np
import torch
import torch.distributed as dist
import torch.nn as nn
import yaml
from torch.optim import lr_scheduler
from tqdm import tqdm

FILE = Path(__file__).resolve()
ROOT = FILE.parents[0]  # root directory
if str(ROOT) not in sys.path:
    sys.path.append(str(ROOT))  # add ROOT to PATH
ROOT = Path(os.path.relpath(ROOT, Path.cwd()))  # relative

import val as validate  # for end-of-epoch mAP
from models.experimental import attempt_load
from models.yolo import Model
from utils.autoanchor import check_anchors
from utils.autobatch import check_train_batch_size
from utils.callbacks import Callbacks
from utils.dataloaders import create_dataloader
from utils.downloads import attempt_download, is_url
from utils.general import (LOGGER, TQDM_BAR_FORMAT, check_amp, check_dataset, check_file, check_img_size,
                           check_suffix, check_yaml, colorstr, get_latest_run, increment_path, init_seeds,
                           intersect_dicts, labels_to_class_weights, labels_to_image_weights, methods,
                           one_cycle, one_flat_cycle, print_args, print_mutation, strip_optimizer, yaml_save)
from utils.loggers import Loggers
from utils.loggers.comet.comet_utils import check_comet_resume
from utils.loss_tal import ComputeLoss
from utils.metrics import fitness
from utils.plots import plot_evolve
from utils.torch_utils import (EarlyStopping, ModelEMA, de_parallel, select_device, smart_DDP,
                               smart_optimizer, smart_resume, torch_distributed_zero_first)

import os, sys
sys.path.append('/data/hoangtv23/workspace_AIOT/model_compression_flow/TinyNeuralNet')

from tinynn.util.quantization_analysis_util import graph_error_analysis, layer_error_analysis, get_weight_dis
from tinynn.util.train_util import AverageMeter, DLContext, train, get_device
from tinynn.graph.quantization.quantizer import QATQuantizer, PostQuantizer
from tinynn.graph.tracer import model_tracer
from tinynn.converter import TFLiteConverter
from tinynn.graph.quantization.algorithm.cross_layer_equalization import cross_layer_equalize
from tinynn.util.bn_restore import model_restore_bn

LOCAL_RANK = int(os.getenv('LOCAL_RANK', -1))  # https://pytorch.org/docs/stable/elastic/run.html
RANK = int(os.getenv('RANK', -1))
WORLD_SIZE = int(os.getenv('WORLD_SIZE', 1))
GIT_INFO = None

load_model_for_quant = True

def save_quantized_model(quantizer, model, save_to):
    with torch.no_grad():
        save = deepcopy(model)
        save.eval()
        save.to(torch.device('cpu'))
        save = quantizer.convert(save)
        torch.backends.quantized.engine = quantizer.backend
        dummy_input = torch.rand(1, 3, imgsz, imgsz)
        converter = TFLiteConverter(save, dummy_input, tflite_path=save_to)
        converter.convert()

def calibrate_func(qat_model, context: DLContext):
    qat_model.to(device=context.device)
    qat_model.eval()

    bar = tqdm(enumerate(context.train_loader), total=len(context.train_loader), bar_format=TQDM_BAR_FORMAT) 
    # we do one inference to simulate qat init state
    with torch.no_grad():
        for batch_i, (imgs, targets, paths, _) in (bar):
            imgs = imgs.to(context.device, non_blocking=True).float() / 255.0
            qat_model(imgs)
            if context.max_iteration is not None and batch_i >= context.max_iteration:
                break

def train(hyp, opt, device, callbacks):  # hyp is path/to/hyp.yaml or hyp dictionary
    save_dir, epochs, batch_size, weights, single_cls, evolve, data, cfg, resume, noval, nosave, workers, freeze = \
        Path(opt.save_dir), opt.epochs, opt.batch_size, opt.weights, opt.single_cls, opt.evolve, opt.data, opt.cfg, \
        opt.resume, opt.noval, opt.nosave, opt.workers, opt.freeze
    callbacks.run('on_pretrain_routine_start')

    # Directories
    w = save_dir / 'weights'  # weights dir
    (w.parent if evolve else w).mkdir(parents=True, exist_ok=True)  # make dir
    last, best = w / 'last.pt', w / 'best.pt'
    last_striped, best_striped = w / 'last_striped.pt', w / 'best_striped.pt'

    # Hyperparameters
    if isinstance(hyp, str):
        with open(hyp, errors='ignore') as f:
            hyp = yaml.safe_load(f)  # load hyps dict
    LOGGER.info(colorstr('hyperparameters: ') + ', '.join(f'{k}={v}' for k, v in hyp.items()))
    hyp['anchor_t'] = 5.0
    opt.hyp = hyp.copy()  # for saving hyps to checkpoints

    # Save run settings
    if not evolve:
        yaml_save(save_dir / 'hyp.yaml', hyp)
        yaml_save(save_dir / 'opt.yaml', vars(opt))

    # Loggers
    data_dict = None
    if RANK in {-1, 0}:
        loggers = Loggers(save_dir, weights, opt, hyp, LOGGER)  # loggers instance

        # Register actions
        for k in methods(loggers):
            callbacks.register_action(k, callback=getattr(loggers, k))

        # Process custom dataset artifact link
        data_dict = loggers.remote_dataset
        if resume:  # If resuming runs from remote artifact
            weights, epochs, hyp, batch_size = opt.weights, opt.epochs, opt.hyp, opt.batch_size

    # Config
    plots = not evolve and not opt.noplots  # create plots
    cuda = device.type != 'cpu'
    init_seeds(opt.seed + 1 + RANK, deterministic=True)
    with torch_distributed_zero_first(LOCAL_RANK):
        data_dict = data_dict or check_dataset(data)  # check if None
    train_path, val_path = data_dict['train'], data_dict['val']
    nc = 1 if single_cls else int(data_dict['nc'])  # number of classes
    names = {0: 'item'} if single_cls and len(data_dict['names']) != 1 else data_dict['names']  # class names
    #is_coco = isinstance(val_path, str) and val_path.endswith('coco/val2017.txt')  # COCO dataset
    is_coco = isinstance(val_path, str) and val_path.endswith('val2017.txt')  # COCO dataset

    # Model
    check_suffix(weights, '.pt')  # check weights
    pretrained = weights.endswith('.pt')

    if load_model_for_quant:
        model = torch.load(weights, map_location='cpu')["model"].float().to(device)
        csd, ckpt = None ,None 
    if pretrained:
        with torch_distributed_zero_first(LOCAL_RANK):
            weights = attempt_download(weights)  # download if not found locally
        ckpt = torch.load(weights, map_location='cpu')  # load checkpoint to CPU to avoid CUDA memory leak
        model = Model(cfg or ckpt['model'].yaml, ch=3, nc=1, anchors=hyp.get('anchors')).to(device)  # create
        exclude = ['anchor'] if (cfg or hyp.get('anchors')) and not resume else []  # exclude keys
        csd = ckpt['model'].float().state_dict()  # checkpoint state_dict as FP32
        csd = intersect_dicts(csd, model.state_dict(), exclude=exclude)  # intersect
        model.load_state_dict(csd, strict=False)  # load
        LOGGER.info(f'Transferred {len(csd)}/{len(model.state_dict())} items from {weights}')  # report
    else:
        model = Model(cfg, ch=3, nc=nc, anchors=hyp.get('anchors')).to(device)  # create
    amp = False # check_amp(model)  # check AMP

    # !!! QAT !!!
    device = get_device()
    context = DLContext()
    context.device = device

    # Freeze
    freeze = [f'model.{x}.' for x in (freeze if len(freeze) > 1 else range(freeze[0]))]  # layers to freeze
    for k, v in model.named_parameters():
        # v.requires_grad = True  # train all layers TODO: uncomment this line as in master
        # v.register_hook(lambda x: torch.nan_to_num(x))  # NaN to 0 (commented for erratic training results)
        if any(x in k for x in freeze):
            LOGGER.info(f'freezing {k}')
            v.requires_grad = False

    # Image size
    gs = max(int(model.stride.max()), 32)  # grid size (max stride)
    imgsz = check_img_size(opt.imgsz, gs, floor=gs * 2)  # verify imgsz is gs-multiple

    # Batch size
    if RANK == -1 and batch_size == -1:  # single-GPU only, estimate best batch size
        batch_size = check_train_batch_size(model, imgsz, amp)
        loggers.on_params_update({"batch_size": batch_size})

    # Optimizer
    nbs = 64  # nominal batch size
    accumulate = max(round(nbs / batch_size), 1)  # accumulate loss before optimizing
    hyp['weight_decay'] *= batch_size * accumulate / nbs  # scale weight_decay

    optimizer = smart_optimizer(model, opt.optimizer, hyp['lr0'], hyp['momentum'], hyp['weight_decay'])

    # Scheduler
    if opt.cos_lr:
        lf = one_cycle(1, hyp['lrf'], epochs)  # cosine 1->hyp['lrf']
    elif opt.flat_cos_lr:
        lf = one_flat_cycle(1, hyp['lrf'], epochs)  # flat cosine 1->hyp['lrf']        
    elif opt.fixed_lr:
        lf = lambda x: 1.0
    else:
        lf = lambda x: (1 - x / epochs) * (1.0 - hyp['lrf']) + hyp['lrf']  # linear

    scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lf)
    # from utils.plots import plot_lr_scheduler; plot_lr_scheduler(optimizer, scheduler, epochs)

    # EMA
    ema = None # ModelEMA(model) if RANK in {-1, 0} else None

    # Resume
    best_fitness, start_epoch = 0.0, 0
    if pretrained:
        if resume:
            best_fitness, start_epoch, epochs = smart_resume(ckpt, optimizer, ema, weights, epochs, resume)
        del ckpt, csd

    # DP mode
    if cuda and RANK == -1 and torch.cuda.device_count() > 1:
        LOGGER.warning('WARNING ⚠️ DP not recommended, use torch.distributed.run for best DDP Multi-GPU results.')
        model = torch.nn.DataParallel(model)

    # SyncBatchNorm
    if opt.sync_bn and cuda and RANK != -1:
        model = torch.nn.SyncBatchNorm.convert_sync_batchnorm(model).to(device)
        LOGGER.info('Using SyncBatchNorm()')

    # Trainloader
    train_loader, dataset = create_dataloader(train_path,
                                              imgsz,
                                              batch_size // WORLD_SIZE,
                                              gs,
                                              single_cls=False,
                                              hyp=hyp,
                                              augment=True,
                                              cache=None if opt.cache == 'val' else opt.cache,
                                              rect=False,
                                              rank=LOCAL_RANK,
                                              workers=workers,
                                              image_weights=opt.image_weights,
                                              close_mosaic=opt.close_mosaic != 0,
                                              quad=opt.quad,
                                              prefix=colorstr('train: '),
                                              shuffle=True,
                                              min_items=opt.min_items)
    labels = np.concatenate(dataset.labels, 0)
    mlc = int(labels[:, 0].max())  # max label class
    assert mlc < nc, f'Label class {mlc} exceeds nc={nc} in {data}. Possible class labels are 0-{nc - 1}'

    # Process 0
    if RANK in {-1, 0}:
        val_loader = create_dataloader(val_path,
                                       imgsz,
                                       batch_size // WORLD_SIZE * 2,
                                       gs,
                                       single_cls=False,
                                       hyp=hyp,
                                       cache=None if noval else opt.cache,
                                       rect=False,
                                       rank=-1,
                                       workers=workers * 2,
                                       pad=0.5,
                                       prefix=colorstr('val: '))[0]

        if not resume:
            # if not opt.noautoanchor:
            #     check_anchors(dataset, model=model, thr=hyp['anchor_t'], imgsz=imgsz)  # run AutoAnchor
            model.half().float()  # pre-reduce anchor precision

        callbacks.run('on_pretrain_routine_end', labels, names)

    # !!! QAT !!!
    calib_loader = create_dataloader(train_path,
                                       imgsz,
                                       batch_size // WORLD_SIZE * 2,
                                       gs,
                                       single_cls=False,
                                       hyp=hyp,
                                       cache=None if noval else opt.cache,
                                       rect=False,
                                       rank=-1,
                                       workers=workers * 2,
                                       pad=0.5,
                                       prefix=colorstr('val: '))[0]

    qat_rewrite_dir = '/data/hoangtv23/workspace_AIOT/model_compression_flow/PruneQuantExperiments/yolov9/qat_out_no_prune'

    # results, maps, _ = validate.run(data_dict,
    #                                 batch_size=batch_size // WORLD_SIZE * 2,
    #                                 imgsz=imgsz,
    #                                 half=amp,
    #                                 model=model,
    #                                 single_cls=single_cls,
    #                                 dataloader=val_loader,
    #                                 save_dir=save_dir,
    #                                 plots=False,
    #                                 callbacks=callbacks,
    #                                 compute_loss=None)

    # print("[INFO] Evaluation metric of raw model:")
    # print(results)
    # del model

    # DDP mode
    if cuda and RANK != -1:
        model = smart_DDP(model)

    # Model attributes
    # nl = de_parallel(model).model[-1].nl  # number of detection layers (to scale hyps)
    #hyp['box'] *= 3 / nl  # scale to layers
    #hyp['cls'] *= nc / 80 * 3 / nl  # scale to classes and layers
    #hyp['obj'] *= (imgsz / 320) ** 2 * 3 / nl  # scale to image size and layers
    hyp['label_smoothing'] = opt.label_smoothing
    qat_model.nc = nc  # attach number of classes to model
    qat_model.hyp = hyp  # attach hyperparameters to model
    qat_model.class_weights = labels_to_class_weights(dataset.labels, nc).to(device) * nc  # attach class weights
    qat_model.names = names

    # Start training
    t0 = time.time()
    nb = len(train_loader)  # number of batches
    nw = max(round(hyp['warmup_epochs'] * nb), 100)  # number of warmup iterations, max(3 epochs, 100 iterations)
    # nw = min(nw, (epochs - start_epoch) / 2 * nb)  # limit warmup to < 1/2 of training
    last_opt_step = -1
    maps = np.zeros(nc)  # mAP per class
    results = (0, 0, 0, 0, 0, 0, 0)  # P, R, mAP@.5, mAP@.5-.95, val_loss(box, obj, cls)
    scheduler.last_epoch = start_epoch - 1  # do not move
    scaler = torch.cuda.amp.GradScaler(enabled=amp)
    stopper, stop = EarlyStopping(patience=opt.patience), False
    compute_loss = ComputeLoss(qat_model)  # init loss class
    callbacks.run('on_train_start')
    LOGGER.info(f'Image sizes {imgsz} train, {imgsz} val\n'
                f'Using {train_loader.num_workers * WORLD_SIZE} dataloader workers\n'
                f"Logging results to {colorstr('bold', save_dir)}\n"
                f'Starting training for {epochs} epochs...')
    for epoch in range(start_epoch, epochs):  # epoch ------------------------------------------------------------------
        callbacks.run('on_train_epoch_start')
        qat_model.train()

        # Update image weights (optional, single-GPU only)
        if opt.image_weights:
            cw = qat_model.class_weights.cpu().numpy() * (1 - maps) ** 2 / nc  # class weights
            iw = labels_to_image_weights(dataset.labels, nc=nc, class_weights=cw)  # image weights
            dataset.indices = random.choices(range(dataset.n), weights=iw, k=dataset.n)  # rand weighted idx
        if epoch == (epochs - opt.close_mosaic):
            LOGGER.info("Closing dataloader mosaic")
            dataset.mosaic = False

        # Update mosaic border (optional)
        # b = int(random.uniform(0.25 * imgsz, 0.75 * imgsz + gs) // gs * gs)
        # dataset.mosaic_border = [b - imgsz, -b]  # height, width borders

        mloss = torch.zeros(3, device=device)  # mean losses
        if RANK != -1:
            train_loader.sampler.set_epoch(epoch)
        pbar = enumerate(train_loader)
        LOGGER.info(('\n' + '%11s' * 7) % ('Epoch', 'GPU_mem', 'box_loss', 'cls_loss', 'dfl_loss', 'Instances', 'Size'))
        if RANK in {-1, 0}:
            pbar = tqdm(pbar, total=nb, bar_format=TQDM_BAR_FORMAT)  # progress bar
        optimizer.zero_grad()
        for i, (imgs, targets, paths, _) in pbar:  # batch -------------------------------------------------------------
            callbacks.run('on_train_batch_start')
            ni = i + nb * epoch  # number integrated batches (since train start)
            imgs = imgs.to(device, non_blocking=True).float() / 255  # uint8 to float32, 0-255 to 0.0-1.0

            # Warmup
            if ni <= nw:
                xi = [0, nw]  # x interp
                # compute_loss.gr = np.interp(ni, xi, [0.0, 1.0])  # iou loss ratio (obj_loss = 1.0 or iou)
                accumulate = max(1, np.interp(ni, xi, [1, nbs / batch_size]).round())
                for j, x in enumerate(optimizer.param_groups):
                    # bias lr falls from 0.1 to lr0, all other lrs rise from 0.0 to lr0
                    x['lr'] = np.interp(ni, xi, [hyp['warmup_bias_lr'] if j == 0 else 0.0, x['initial_lr'] * lf(epoch)])
                    if 'momentum' in x:
                        x['momentum'] = np.interp(ni, xi, [hyp['warmup_momentum'], hyp['momentum']])

            # Multi-scale
            if opt.multi_scale:
                sz = random.randrange(imgsz * 0.5, imgsz * 1.5 + gs) // gs * gs  # size
                sf = sz / max(imgs.shape[2:])  # scale factor
                if sf != 1:
                    ns = [math.ceil(x * sf / gs) * gs for x in imgs.shape[2:]]  # new shape (stretched to gs-multiple)
                    imgs = nn.functional.interpolate(imgs, size=ns, mode='bilinear', align_corners=False)

            # Forward
            with torch.cuda.amp.autocast(amp):
                pred = qat_model(imgs)  # forward
                loss, loss_items = compute_loss(pred, targets.to(device))  # loss scaled by batch_size
                if RANK != -1:
                    loss *= WORLD_SIZE  # gradient averaged between devices in DDP mode
                if opt.quad:
                    loss *= 4.

            # Backward
            scaler.scale(loss).backward()

            # Optimize - https://pytorch.org/docs/master/notes/amp_examples.html
            if ni - last_opt_step >= accumulate:
                scaler.unscale_(optimizer)  # unscale gradients
                torch.nn.utils.clip_grad_norm_(qat_model.parameters(), max_norm=10.0)  # clip gradients
                scaler.step(optimizer)  # optimizer.step
                scaler.update()
                optimizer.zero_grad()
                if ema:
                    ema.update(qat_model)
                last_opt_step = ni

            # Log
            if RANK in {-1, 0}:
                mloss = (mloss * i + loss_items) / (i + 1)  # update mean losses
                mem = f'{torch.cuda.memory_reserved() / 1E9 if torch.cuda.is_available() else 0:.3g}G'  # (GB)
                pbar.set_description(('%11s' * 2 + '%11.4g' * 5) %
                                     (f'{epoch}/{epochs - 1}', mem, *mloss, targets.shape[0], imgs.shape[-1]))
                callbacks.run('on_train_batch_end', qat_model, ni, imgs, targets, paths, list(mloss))
                if callbacks.stop_training:
                    return
            # break
            # end batch ------------------------------------------------------------------------------------------------

        # Scheduler
        lr = [x['lr'] for x in optimizer.param_groups]  # for loggers
        scheduler.step()

        if RANK in {-1, 0}:
            # mAP
            callbacks.run('on_train_epoch_end', epoch=epoch)
            # ema.update_attr(qat_model, include=['yaml', 'nc', 'hyp', 'names', 'stride', 'class_weights'])
            final_epoch = (epoch + 1 == epochs) or stopper.possible_stop
            if not noval or final_epoch:  # Calculate mAP
                results, maps, _ = validate.run(data_dict,
                                                batch_size=batch_size // WORLD_SIZE * 2,
                                                imgsz=imgsz,
                                                half=amp,
                                                model=qat_model,
                                                single_cls=single_cls,
                                                dataloader=val_loader,
                                                save_dir=save_dir,
                                                plots=False,
                                                callbacks=callbacks,
                                                compute_loss=compute_loss)

            # Update best mAP
            fi = fitness(np.array(results).reshape(1, -1))  # weighted combination of [P, R, mAP@.5, mAP@.5-.95]
            stop = stopper(epoch=epoch, fitness=fi)  # early stop check
            if fi > best_fitness:
                best_fitness = fi
            log_vals = list(mloss) + list(results) + lr
            callbacks.run('on_fit_epoch_end', log_vals, epoch, best_fitness, fi)

            # Save model
            if (not nosave) or (final_epoch and not evolve):  # if save
                # ckpt = {
                #     'epoch': epoch,
                #     'best_fitness': best_fitness,
                #     'model': deepcopy(de_parallel(qat_model)),
                #     # 'ema': deepcopy(ema.ema).half(),
                #     # 'updates': ema.updates,
                #     'optimizer': optimizer.state_dict(),
                #     'opt': vars(opt),
                #     'git': GIT_INFO,  # {remote, branch, commit} if a git repo
                #     'date': datetime.now().isoformat()}

                # Save last, best and delete
                # torch.save(ckpt, last)
                if best_fitness == fi:
                    # Quantization Error Report!
                    save_quantized_model(quantizer,de_parallel(qat_model),Path(str(best).replace('best.pt', 'best.tflite')))
                    print("[INFO] Quantization Error Report!")
                    dummy_input_real = next(iter(calib_loader))[0].float() / 255.0
                    layer_error_analysis(qat_model, dummy_input_real, metric='cosine')
                    print("-"*50)
                if opt.save_period > 0 and epoch % opt.save_period == 0:
                    # torch.save(ckpt, w / f'epoch{epoch}.pt')
                    pass
                # del ckpt
                callbacks.run('on_model_save', last, epoch, final_epoch, best_fitness, fi)

        # EarlyStopping
        if RANK != -1:  # if DDP training
            broadcast_list = [stop if RANK == 0 else None]
            dist.broadcast_object_list(broadcast_list, 0)  # broadcast 'stop' to all ranks
            if RANK != 0:
                stop = broadcast_list[0]
        if stop:
            break  # must break all DDP ranks

        # end epoch ----------------------------------------------------------------------------------------------------
    # end training -----------------------------------------------------------------------------------------------------
    if RANK in {-1, 0}:
        LOGGER.info(f'\n{epoch - start_epoch + 1} epochs completed in {(time.time() - t0) / 3600:.3f} hours.')
        for f in last, best:
            if f.exists():
                if f is last:
                    strip_optimizer(f, last_striped)  # strip optimizers
                else:
                    strip_optimizer(f, best_striped)  # strip optimizers
                if f is best:
                    LOGGER.info(f'\nValidating {f}...')
                    results, _, _ = validate.run(
                        data_dict,
                        batch_size=batch_size // WORLD_SIZE * 2,
                        imgsz=imgsz,
                        model=attempt_load(f, device).half(),
                        single_cls=single_cls,
                        dataloader=val_loader,
                        save_dir=save_dir,
                        save_json=is_coco,
                        verbose=True,
                        plots=plots,
                        callbacks=callbacks,
                        compute_loss=compute_loss)  # val best model with plots
                    if is_coco:
                        callbacks.run('on_fit_epoch_end', list(mloss) + list(results) + lr, epoch, best_fitness, fi)

        callbacks.run('on_train_end', last, best, epoch, results)

    torch.cuda.empty_cache()
    return results

def parse_opt(known=False):
    parser = argparse.ArgumentParser()
    # parser.add_argument('--weights', type=str, default=ROOT / 'yolo.pt', help='initial weights path')
    # parser.add_argument('--cfg', type=str, default='', help='model.yaml path')
    parser.add_argument('--weights', type=str, default='', help='initial weights path')
    parser.add_argument('--cfg', type=str, default='yolo.yaml', help='model.yaml path')
    parser.add_argument('--data', type=str, default=ROOT / 'data/coco128.yaml', help='dataset.yaml path')
    parser.add_argument('--hyp', type=str, default=ROOT / 'data/hyps/hyp.scratch-low.yaml', help='hyperparameters path')
    parser.add_argument('--epochs', type=int, default=100, help='total training epochs')
    parser.add_argument('--batch-size', type=int, default=128, help='total batch size for all GPUs, -1 for autobatch')
    parser.add_argument('--imgsz', '--img', '--img-size', type=int, default=320, help='train, val image size (pixels)')
    parser.add_argument('--rect', action='store_true', help='rectangular training')
    parser.add_argument('--resume', nargs='?', const=True, default=False, help='resume most recent training')
    parser.add_argument('--nosave', action='store_true', help='only save final checkpoint')
    parser.add_argument('--noval', action='store_true', help='only validate final epoch')
    parser.add_argument('--noautoanchor', action='store_true', help='disable AutoAnchor')
    parser.add_argument('--noplots', action='store_true', help='save no plot files')
    parser.add_argument('--evolve', type=int, nargs='?', const=300, help='evolve hyperparameters for x generations')
    parser.add_argument('--bucket', type=str, default='', help='gsutil bucket')
    parser.add_argument('--cache', type=str, nargs='?', const='ram', help='image --cache ram/disk')
    parser.add_argument('--image-weights', action='store_true', help='use weighted image selection for training')
    parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
    parser.add_argument('--multi-scale', action='store_true', help='vary img-size +/- 50%%')
    parser.add_argument('--single-cls', action='store_true', help='train multi-class data as single-class')
    parser.add_argument('--optimizer', type=str, choices=['SGD', 'Adam', 'AdamW', 'LION'], default='SGD', help='optimizer')
    parser.add_argument('--sync-bn', action='store_true', help='use SyncBatchNorm, only available in DDP mode')
    parser.add_argument('--workers', type=int, default=8, help='max dataloader workers (per RANK in DDP mode)')
    parser.add_argument('--project', default=ROOT / 'runs/qat_train', help='save to project/name')
    parser.add_argument('--name', default='exp', help='save to project/name')
    parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')
    parser.add_argument('--quad', action='store_true', help='quad dataloader')
    parser.add_argument('--cos-lr', action='store_true', help='cosine LR scheduler')
    parser.add_argument('--flat-cos-lr', action='store_true', help='flat cosine LR scheduler')
    parser.add_argument('--fixed-lr', action='store_true', help='fixed LR scheduler')
    parser.add_argument('--label-smoothing', type=float, default=0.0, help='Label smoothing epsilon')
    parser.add_argument('--patience', type=int, default=100, help='EarlyStopping patience (epochs without improvement)')
    parser.add_argument('--freeze', nargs='+', type=int, default=[0], help='Freeze layers: backbone=10, first3=0 1 2')
    parser.add_argument('--save-period', type=int, default=-1, help='Save checkpoint every x epochs (disabled if < 1)')
    parser.add_argument('--seed', type=int, default=0, help='Global training seed')
    parser.add_argument('--local_rank', type=int, default=-1, help='Automatic DDP Multi-GPU argument, do not modify')
    parser.add_argument('--min-items', type=int, default=0, help='Experimental')
    parser.add_argument('--close-mosaic', type=int, default=0, help='Experimental')

    # Logger arguments
    parser.add_argument('--entity', default=None, help='Entity')
    parser.add_argument('--upload_dataset', nargs='?', const=True, default=False, help='Upload data, "val" option')
    parser.add_argument('--bbox_interval', type=int, default=-1, help='Set bounding-box image logging interval')
    parser.add_argument('--artifact_alias', type=str, default='latest', help='Version of dataset artifact to use')

    return parser.parse_known_args()[0] if known else parser.parse_args()

def main(opt, callbacks=Callbacks()):
    # Checks
    if RANK in {-1, 0}:
        print_args(vars(opt))

    # Resume (from specified or most recent last.pt)
    if opt.resume and not check_comet_resume(opt) and not opt.evolve:
        last = Path(check_file(opt.resume) if isinstance(opt.resume, str) else get_latest_run())
        opt_yaml = last.parent.parent / 'opt.yaml'  # train options yaml
        opt_data = opt.data  # original dataset
        if opt_yaml.is_file():
            with open(opt_yaml, errors='ignore') as f:
                d = yaml.safe_load(f)
        else:
            d = torch.load(last, map_location='cpu')['opt']
        opt = argparse.Namespace(**d)  # replace
        opt.cfg, opt.weights, opt.resume = '', str(last), True  # reinstate
        if is_url(opt_data):
            opt.data = check_file(opt_data)  # avoid HUB resume auth timeout
    else:
        opt.data, opt.cfg, opt.hyp, opt.weights, opt.project = \
            check_file(opt.data), check_yaml(opt.cfg), check_yaml(opt.hyp), str(opt.weights), str(opt.project)  # checks
        assert len(opt.cfg) or len(opt.weights), 'either --cfg or --weights must be specified'
        if opt.evolve:
            if opt.project == str(ROOT / 'runs/train'):  # if default project name, rename to runs/evolve
                opt.project = str(ROOT / 'runs/evolve')
            opt.exist_ok, opt.resume = opt.resume, False  # pass resume to exist_ok and disable resume
        if opt.name == 'cfg':
            opt.name = Path(opt.cfg).stem  # use model.yaml as name
        opt.save_dir = str(increment_path(Path(opt.project) / opt.name, exist_ok=opt.exist_ok))

    # DDP mode
    device = select_device(opt.device, batch_size=opt.batch_size)
    if LOCAL_RANK != -1:
        msg = 'is not compatible with YOLO Multi-GPU DDP training'
        assert not opt.image_weights, f'--image-weights {msg}'
        assert not opt.evolve, f'--evolve {msg}'
        assert opt.batch_size != -1, f'AutoBatch with --batch-size -1 {msg}, please pass a valid --batch-size'
        assert opt.batch_size % WORLD_SIZE == 0, f'--batch-size {opt.batch_size} must be multiple of WORLD_SIZE'
        assert torch.cuda.device_count() > LOCAL_RANK, 'insufficient CUDA devices for DDP command'
        torch.cuda.set_device(LOCAL_RANK)
        device = torch.device('cuda', LOCAL_RANK)
        dist.init_process_group(backend="nccl" if dist.is_nccl_available() else "gloo")

    # Train
    if not opt.evolve:
        train(opt.hyp, opt, device, callbacks)

    # Evolve hyperparameters (optional)
    else:
        # Hyperparameter evolution metadata (mutation scale 0-1, lower_limit, upper_limit)
        meta = {
            'lr0': (1, 1e-5, 1e-1),  # initial learning rate (SGD=1E-2, Adam=1E-3)
            'lrf': (1, 0.01, 1.0),  # final OneCycleLR learning rate (lr0 * lrf)
            'momentum': (0.3, 0.6, 0.98),  # SGD momentum/Adam beta1
            'weight_decay': (1, 0.0, 0.001),  # optimizer weight decay
            'warmup_epochs': (1, 0.0, 5.0),  # warmup epochs (fractions ok)
            'warmup_momentum': (1, 0.0, 0.95),  # warmup initial momentum
            'warmup_bias_lr': (1, 0.0, 0.2),  # warmup initial bias lr
            'box': (1, 0.02, 0.2),  # box loss gain
            'cls': (1, 0.2, 4.0),  # cls loss gain
            'cls_pw': (1, 0.5, 2.0),  # cls BCELoss positive_weight
            'obj': (1, 0.2, 4.0),  # obj loss gain (scale with pixels)
            'obj_pw': (1, 0.5, 2.0),  # obj BCELoss positive_weight
            'iou_t': (0, 0.1, 0.7),  # IoU training threshold
            'anchor_t': (1, 2.0, 8.0),  # anchor-multiple threshold
            'anchors': (2, 2.0, 10.0),  # anchors per output grid (0 to ignore)
            'fl_gamma': (0, 0.0, 2.0),  # focal loss gamma (efficientDet default gamma=1.5)
            'hsv_h': (1, 0.0, 0.1),  # image HSV-Hue augmentation (fraction)
            'hsv_s': (1, 0.0, 0.9),  # image HSV-Saturation augmentation (fraction)
            'hsv_v': (1, 0.0, 0.9),  # image HSV-Value augmentation (fraction)
            'degrees': (1, 0.0, 45.0),  # image rotation (+/- deg)
            'translate': (1, 0.0, 0.9),  # image translation (+/- fraction)
            'scale': (1, 0.0, 0.9),  # image scale (+/- gain)
            'shear': (1, 0.0, 10.0),  # image shear (+/- deg)
            'perspective': (0, 0.0, 0.001),  # image perspective (+/- fraction), range 0-0.001
            'flipud': (1, 0.0, 1.0),  # image flip up-down (probability)
            'fliplr': (0, 0.0, 1.0),  # image flip left-right (probability)
            'mosaic': (1, 0.0, 1.0),  # image mixup (probability)
            'mixup': (1, 0.0, 1.0),  # image mixup (probability)
            'copy_paste': (1, 0.0, 1.0)}  # segment copy-paste (probability)

        with open(opt.hyp, errors='ignore') as f:
            hyp = yaml.safe_load(f)  # load hyps dict
            if 'anchors' not in hyp:  # anchors commented in hyp.yaml
                hyp['anchors'] = 3
        if opt.noautoanchor:
            del hyp['anchors'], meta['anchors']
        opt.noval, opt.nosave, save_dir = True, True, Path(opt.save_dir)  # only val/save final epoch
        # ei = [isinstance(x, (int, float)) for x in hyp.values()]  # evolvable indices
        evolve_yaml, evolve_csv = save_dir / 'hyp_evolve.yaml', save_dir / 'evolve.csv'
        if opt.bucket:
            os.system(f'gsutil cp gs://{opt.bucket}/evolve.csv {evolve_csv}')  # download evolve.csv if exists

        for _ in range(opt.evolve):  # generations to evolve
            if evolve_csv.exists():  # if evolve.csv exists: select best hyps and mutate
                # Select parent(s)
                parent = 'single'  # parent selection method: 'single' or 'weighted'
                x = np.loadtxt(evolve_csv, ndmin=2, delimiter=',', skiprows=1)
                n = min(5, len(x))  # number of previous results to consider
                x = x[np.argsort(-fitness(x))][:n]  # top n mutations
                w = fitness(x) - fitness(x).min() + 1E-6  # weights (sum > 0)
                if parent == 'single' or len(x) == 1:
                    # x = x[random.randint(0, n - 1)]  # random selection
                    x = x[random.choices(range(n), weights=w)[0]]  # weighted selection
                elif parent == 'weighted':
                    x = (x * w.reshape(n, 1)).sum(0) / w.sum()  # weighted combination

                # Mutate
                mp, s = 0.8, 0.2  # mutation probability, sigma
                npr = np.random
                npr.seed(int(time.time()))
                g = np.array([meta[k][0] for k in hyp.keys()])  # gains 0-1
                ng = len(meta)
                v = np.ones(ng)
                while all(v == 1):  # mutate until a change occurs (prevent duplicates)
                    v = (g * (npr.random(ng) < mp) * npr.randn(ng) * npr.random() * s + 1).clip(0.3, 3.0)
                for i, k in enumerate(hyp.keys()):  # plt.hist(v.ravel(), 300)
                    hyp[k] = float(x[i + 7] * v[i])  # mutate

            # Constrain to limits
            for k, v in meta.items():
                hyp[k] = max(hyp[k], v[1])  # lower limit
                hyp[k] = min(hyp[k], v[2])  # upper limit
                hyp[k] = round(hyp[k], 5)  # significant digits

            # Train mutation
            results = train(hyp.copy(), opt, device, callbacks)
            callbacks = Callbacks()
            # Write mutation results
            keys = ('metrics/precision', 'metrics/recall', 'metrics/mAP_0.5', 'metrics/mAP_0.5:0.95', 'val/box_loss',
                    'val/obj_loss', 'val/cls_loss')
            print_mutation(keys, results, hyp.copy(), save_dir, opt.bucket)

        # Plot results
        plot_evolve(evolve_csv)
        LOGGER.info(f'Hyperparameter evolution finished {opt.evolve} generations\n'
                    f"Results saved to {colorstr('bold', save_dir)}\n"
                    f'Usage example: $ python train.py --hyp {evolve_yaml}')

def run(**kwargs):
    # Usage: import train; train.run(data='coco128.yaml', imgsz=320, weights='yolo.pt')
    opt = parse_opt(True)
    for k, v in kwargs.items():
        setattr(opt, k, v)
    main(opt)
    return opt

if __name__ == "__main__":
    opt = parse_opt()
    main(opt)
zk1998 commented 1 month ago

Does the abnormal exponential growth of activation values occur before QAT?

hoangtv2000 commented 1 month ago

How to know if exponential activation values occur before QAT? The rewritten model collected from PostQuantizer has -inf and inf values in HistogramObserver. Does it indicate exponential growth of activation?

QDetectionModel(
  (fake_quant_0): QuantStub(
    (activation_post_process): HistogramObserver(min_val=inf, max_val=-inf)
  )
  (fake_quant_1): QuantStub(
    (activation_post_process): HistogramObserver(min_val=inf, max_val=-inf)
  )
  (fake_quant_2): QuantStub(
    (activation_post_process): HistogramObserver(min_val=inf, max_val=-inf)
  )
  (model_0_conv): Conv2d(
    3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)
    (activation_post_process): HistogramObserver(min_val=inf, max_val=-inf)
  )
  (model_0_bn): Identity()
  (model_0_act): Identity()
  (model_1_conv): Conv2d(
    32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)
    (activation_post_process): HistogramObserver(min_val=inf, max_val=-inf)
  )
...

Btw, the output of rewritten model and original model are the same.

zk1998 commented 1 month ago

@hoangtv2000 Referring to post_error_anaylsis.py , you can replace Calibration with one single forward pass using the preprocessed img tensor.

hoangtv2000 commented 1 month ago

I got an experiment below and the exponential activation output symtoms remains the same.


    with model_tracer():
        dummy_input = torch.rand(1, 3, imgsz, imgsz)

        print("[INFO] Loading rewritten model!")
        from qat_out_no_prune.detectionmodel_q import QDetectionModel
        rewrite_model = QDetectionModel()
        dummy_input = torch.rand(1, 3, imgsz, imgsz)
        rewrite_model.load_state_dict(torch.load(qat_rewrite_dir + '/detectionmodel_q.pth'))

        quantizer = QATQuantizer(rewrite_model, dummy_input, work_dir=qat_rewrite_dir, config={'force_rewrite': False, "rewrite_graph": False,})
        qat_model = quantizer.quantize()
        qat_model(next(iter(calib_loader))[0].float() / 255.0) # I have to do this. If not, I will get "min nan should not be greater than max" error.

    def ptq_fuse(self, verbose=None):
        return self
    import types

    qat_model.fuse = types.MethodType(ptq_fuse, qat_model)
    qat_model.nc = nc  # attach number of classes to model
    qat_model.hyp = hyp  # attach hyperparameters to model
    qat_model.class_weights = labels_to_class_weights(dataset.labels, nc).to(device) * nc  # attach class weights
    qat_model.names = names
    qat_model = qat_model.to(device)

    qat_model.apply(torch.quantization.disable_fake_quant)
    qat_model.apply(torch.quantization.enable_observer)
    qat_model.eval()

    bar = tqdm(enumerate(calib_loader), total=len(calib_loader), bar_format=TQDM_BAR_FORMAT) 
    # we do one inference to simulate qat init state
    for batch_i, (imgs, targets, paths, _) in (bar):
        imgs = imgs.to(device, non_blocking=True).float() / 255.0
        qat_model(imgs)
        # if batch_i == 64:
        break

    qat_model = deepcopy(qat_model)
    qat_model = qat_model.to(device)
    dummy_input = dummy_input.to(device)
    qat_model.apply(torch.quantization.disable_observer)
    qat_model.apply(torch.quantization.enable_fake_quant)
    qat_model(dummy_input)

    dummy_input_real = next(iter(calib_loader))[0].float() / 255.0
    # get_weight_dis(qat_model, save_path='/data/hoangtv23/workspace_AIOT/model_compression_flow/PruneQuantExperiments/yolov9/qat_out_no_prune/weight_dis')
    graph_error_analysis(qat_model, dummy_input_real, metric='cosine')
    layer_error_analysis(qat_model, dummy_input_real, metric='cosine')

    exit()
Activations (cosine sorted ):
fake_quant_0                                       cosine: 1.0000, scale: 0.0039, zero_point: 0
fake_quant_1                                       cosine: 1.0000, scale: 0.1549, zero_point: 0
fake_quant_2                                       cosine: 1.0000, scale: 0.1255, zero_point: 0
model_0_conv                                       cosine: 0.9995, scale: 0.6622, zero_point: 127
model_1_conv                                       cosine: 0.9987, scale: 0.6469, zero_point: 133
model_2_cv1_conv                                   cosine: 0.9979, scale: 0.7072, zero_point: 128
model_2_cv2_conv                                   cosine: 0.9967, scale: 0.3584, zero_point: 122
model_2_cv3_conv                                   cosine: 0.9582, scale: 0.4375, zero_point: 127
float_functional_simple_0                          cosine: 0.8661, scale: 0.7551, zero_point: 127
model_2_cv4_conv                                   cosine: 0.9043, scale: 0.2605, zero_point: 127
model_3_cv1_conv                                   cosine: 0.8694, scale: 0.2222, zero_point: 141
model_4_cv1_conv                                   cosine: 0.7958, scale: 0.2321, zero_point: 132
model_4_cv2_0_cv1_conv                             cosine: 0.7718, scale: 0.1506, zero_point: 101
model_4_cv2_0_m_0_cv1_conv1_conv                   cosine: 0.7584, scale: 0.2788, zero_point: 143
model_4_cv2_0_m_0_cv1_conv2_conv                   cosine: 0.7357, scale: 0.2457, zero_point: 135
float_functional_simple_1                          cosine: 0.7111, scale: 0.3302, zero_point: 132
model_4_cv2_0_m_0_cv2_conv                         cosine: 0.7384, scale: 0.1695, zero_point: 128
float_functional_simple_3                          cosine: 0.5934, scale: 0.2273, zero_point: 107
model_4_cv2_0_m_1_cv1_conv1_conv                   cosine: 0.5761, scale: 0.1158, zero_point: 137
model_4_cv2_0_m_1_cv1_conv2_conv                   cosine: 0.6113, scale: 0.0857, zero_point: 134
float_functional_simple_4                          cosine: 0.5635, scale: 0.1639, zero_point: 126
model_4_cv2_0_m_1_cv2_conv                         cosine: 0.5714, scale: 0.1840, zero_point: 124
float_functional_simple_6                          cosine: 0.5833, scale: 0.2691, zero_point: 108
model_4_cv2_0_m_2_cv1_conv1_conv                   cosine: 0.5403, scale: 0.2093, zero_point: 146
model_4_cv2_0_m_2_cv1_conv2_conv                   cosine: 0.5560, scale: 0.0794, zero_point: 134
float_functional_simple_7                          cosine: 0.5269, scale: 0.2184, zero_point: 152
model_4_cv2_0_m_2_cv2_conv                         cosine: 0.4786, scale: 0.2730, zero_point: 134
float_functional_simple_9                          cosine: 0.5036, scale: 0.4046, zero_point: 120
model_4_cv2_0_cv2_conv                             cosine: 0.7550, scale: 0.1399, zero_point: 114
float_functional_simple_10                         cosine: 0.4949, scale: 0.4046, zero_point: 120
model_4_cv2_0_cv3_conv                             cosine: 0.4867, scale: 0.2389, zero_point: 133
model_4_cv2_1_conv                                 cosine: 0.5396, scale: 0.2638, zero_point: 133
model_4_cv3_0_cv1_conv                             cosine: 0.4908, scale: 0.2428, zero_point: 120
model_4_cv3_0_m_0_cv1_conv1_conv                   cosine: 0.4015, scale: 0.4416, zero_point: 131
model_4_cv3_0_m_0_cv1_conv2_conv                   cosine: 0.4808, scale: 0.2269, zero_point: 119
float_functional_simple_11                         cosine: 0.3902, scale: 0.5101, zero_point: 133
model_4_cv3_0_m_0_cv2_conv                         cosine: 0.3727, scale: 0.5876, zero_point: 117
float_functional_simple_13                         cosine: 0.3813, scale: 0.6536, zero_point: 113
model_4_cv3_0_m_1_cv1_conv1_conv                   cosine: 0.4255, scale: 0.8962, zero_point: 134
model_4_cv3_0_m_1_cv1_conv2_conv                   cosine: 0.3666, scale: 0.3104, zero_point: 124
float_functional_simple_14                         cosine: 0.3900, scale: 0.9144, zero_point: 132
model_4_cv3_0_m_1_cv2_conv                         cosine: 0.3334, scale: 1.2969, zero_point: 132
float_functional_simple_16                         cosine: 0.3563, scale: 1.4614, zero_point: 132
model_4_cv3_0_m_2_cv1_conv1_conv                   cosine: 0.3160, scale: 1.7259, zero_point: 137
model_4_cv3_0_m_2_cv1_conv2_conv                   cosine: 0.3400, scale: 0.9053, zero_point: 132
float_functional_simple_17                         cosine: 0.3275, scale: 1.8023, zero_point: 132
model_4_cv3_0_m_2_cv2_conv                         cosine: 0.2782, scale: 4.8702, zero_point: 121
float_functional_simple_19                         cosine: 0.3001, scale: 5.3482, zero_point: 122
model_4_cv3_0_cv2_conv                             cosine: 0.4245, scale: 0.7931, zero_point: 145
float_functional_simple_20                         cosine: 0.3013, scale: 5.3482, zero_point: 122
model_4_cv3_0_cv3_conv                             cosine: 0.3039, scale: 3.3840, zero_point: 128
model_4_cv3_1_conv                                 cosine: 0.3053, scale: 7.1106, zero_point: 130
float_functional_simple_21                         cosine: 0.3068, scale: 7.1696, zero_point: 130
model_4_cv4_conv                                   cosine: 0.2646, scale: 6.6973, zero_point: 130
model_5_cv1_conv                                   cosine: 0.2434, scale: 21.4104, zero_point: 133
model_6_cv1_conv                                   cosine: 0.1832, scale: 43.1402, zero_point: 131
model_6_cv2_0_cv1_conv                             cosine: 0.2320, scale: 69.5078, zero_point: 121
model_6_cv2_0_m_0_cv1_conv1_conv                   cosine: 0.3178, scale: 122.0606, zero_point: 128
model_6_cv2_0_m_0_cv1_conv2_conv                   cosine: 0.2407, scale: 60.7877, zero_point: 126
float_functional_simple_22                         cosine: 0.3243, scale: 150.1270, zero_point: 123
model_6_cv2_0_m_0_cv2_conv                         cosine: 0.3435, scale: 150.2347, zero_point: 125
float_functional_simple_24                         cosine: 0.3652, scale: 165.0695, zero_point: 123
model_6_cv2_0_m_1_cv1_conv1_conv                   cosine: 0.4512, scale: 344.7465, zero_point: 142
model_6_cv2_0_m_1_cv1_conv2_conv                   cosine: 0.4200, scale: 170.4397, zero_point: 128
float_functional_simple_25                         cosine: 0.4605, scale: 371.5423, zero_point: 145
model_6_cv2_0_m_1_cv2_conv                         cosine: 0.5030, scale: 563.4467, zero_point: 127
float_functional_simple_27                         cosine: 0.5139, scale: 614.4595, zero_point: 115
model_6_cv2_0_m_2_cv1_conv1_conv                   cosine: 0.4922, scale: 679.3282, zero_point: 119
model_6_cv2_0_m_2_cv1_conv2_conv                   cosine: 0.5172, scale: 371.4168, zero_point: 131
float_functional_simple_28                         cosine: 0.4995, scale: 887.1314, zero_point: 129
model_6_cv2_0_m_2_cv2_conv                         cosine: 0.4912, scale: 1768.0994, zero_point: 120
float_functional_simple_30                         cosine: 0.5149, scale: 2113.8289, zero_point: 125
model_6_cv2_0_cv2_conv                             cosine: 0.2051, scale: 99.4882, zero_point: 136
float_functional_simple_31                         cosine: 0.5140, scale: 2113.8289, zero_point: 125
model_6_cv2_0_cv3_conv                             cosine: 0.5446, scale: 1462.1903, zero_point: 132
model_6_cv2_1_conv                                 cosine: 0.5259, scale: 2668.5996, zero_point: 121
model_6_cv3_0_cv1_conv                             cosine: 0.5408, scale: 8808.1006, zero_point: 127
model_6_cv3_0_m_0_cv1_conv1_conv                   cosine: 0.5837, scale: 16898.4004, zero_point: 132
model_6_cv3_0_m_0_cv1_conv2_conv                   cosine: 0.5453, scale: 5084.9458, zero_point: 133
float_functional_simple_32                         cosine: 0.5809, scale: 17652.0059, zero_point: 134
model_6_cv3_0_m_0_cv2_conv                         cosine: 0.5817, scale: 30606.0957, zero_point: 131
float_functional_simple_34                         cosine: 0.5842, scale: 32891.7305, zero_point: 139
model_6_cv3_0_m_1_cv1_conv1_conv                   cosine: 0.6149, scale: 74468.7266, zero_point: 121
model_6_cv3_0_m_1_cv1_conv2_conv                   cosine: 0.6228, scale: 16287.5479, zero_point: 148
float_functional_simple_35                         cosine: 0.6137, scale: 74473.5391, zero_point: 121
model_6_cv3_0_m_1_cv2_conv                         cosine: 0.6186, scale: 120463.5781, zero_point: 131
float_functional_simple_37                         cosine: 0.6178, scale: 131824.8438, zero_point: 121
model_6_cv3_0_m_2_cv1_conv1_conv                   cosine: 0.6388, scale: 146436.4688, zero_point: 123
model_6_cv3_0_m_2_cv1_conv2_conv                   cosine: 0.6244, scale: 41008.9648, zero_point: 156
float_functional_simple_38                         cosine: 0.6359, scale: 160922.1562, zero_point: 124
model_6_cv3_0_m_2_cv2_conv                         cosine: 0.6319, scale: 435530.1250, zero_point: 121
float_functional_simple_40                         cosine: 0.6277, scale: 467601.0312, zero_point: 126
model_6_cv3_0_cv2_conv                             cosine: 0.4818, scale: 10517.3086, zero_point: 136
float_functional_simple_41                         cosine: 0.6274, scale: 467601.0312, zero_point: 126
model_6_cv3_0_cv3_conv                             cosine: 0.6343, scale: 369678.6562, zero_point: 142
model_6_cv3_1_conv                                 cosine: 0.6140, scale: 646612.6250, zero_point: 110
float_functional_simple_42                         cosine: 0.6137, scale: 646612.6250, zero_point: 110
model_6_cv4_conv                                   cosine: 0.5939, scale: 545735.1875, zero_point: 136
model_7_cv1_conv                                   cosine: 0.6010, scale: 2046317.6250, zero_point: 120
model_8_cv1_conv                                   cosine: 0.5493, scale: 6472694.5000, zero_point: 142
model_8_cv2_0_cv1_conv                             cosine: 0.5317, scale: 14989313.0000, zero_point: 120
model_8_cv2_0_m_0_cv1_conv1_conv                   cosine: 0.4787, scale: 39697680.0000, zero_point: 126
model_8_cv2_0_m_0_cv1_conv2_conv                   cosine: 0.5703, scale: 15901207.0000, zero_point: 125
float_functional_simple_43                         cosine: 0.4760, scale: 42553300.0000, zero_point: 134
model_8_cv2_0_m_0_cv2_conv                         cosine: 0.4288, scale: 66537000.0000, zero_point: 131
float_functional_simple_45                         cosine: 0.4420, scale: 68211928.0000, zero_point: 131
model_8_cv2_0_m_1_cv1_conv1_conv                   cosine: 0.4555, scale: 98912480.0000, zero_point: 120
model_8_cv2_0_m_1_cv1_conv2_conv                   cosine: 0.4006, scale: 39909088.0000, zero_point: 124
float_functional_simple_46                         cosine: 0.4467, scale: 101123176.0000, zero_point: 121
model_8_cv2_0_m_1_cv2_conv                         cosine: 0.4356, scale: 200443808.0000, zero_point: 96
float_functional_simple_48                         cosine: 0.4345, scale: 214782624.0000, zero_point: 92
model_8_cv2_0_m_2_cv1_conv1_conv                   cosine: 0.4560, scale: 175546352.0000, zero_point: 117
model_8_cv2_0_m_2_cv1_conv2_conv                   cosine: 0.4329, scale: 66936008.0000, zero_point: 116
float_functional_simple_49                         cosine: 0.4523, scale: 173478880.0000, zero_point: 117
model_8_cv2_0_m_2_cv2_conv                         cosine: 0.4582, scale: 354761344.0000, zero_point: 137
float_functional_simple_51                         cosine: 0.4381, scale: 329974176.0000, zero_point: 149
model_8_cv2_0_cv2_conv                             cosine: 0.5400, scale: 26214602.0000, zero_point: 129
float_functional_simple_52                         cosine: 0.4384, scale: 329974176.0000, zero_point: 149
model_8_cv2_0_cv3_conv                             cosine: 0.4428, scale: 233357936.0000, zero_point: 110
model_8_cv2_1_conv                                 cosine: 0.4563, scale: 546612352.0000, zero_point: 126
model_8_cv3_0_cv1_conv                             cosine: 0.4608, scale: 1080471936.0000, zero_point: 108
model_8_cv3_0_m_0_cv1_conv1_conv                   cosine: 0.4652, scale: 2173905664.0000, zero_point: 135
model_8_cv3_0_m_0_cv1_conv2_conv                   cosine: 0.6072, scale: 1571598720.0000, zero_point: 107
float_functional_simple_53                         cosine: 0.4795, scale: 2228000256.0000, zero_point: 133
model_8_cv3_0_m_0_cv2_conv                         cosine: 0.4469, scale: 3075362560.0000, zero_point: 144
float_functional_simple_55                         cosine: 0.4528, scale: 3229092096.0000, zero_point: 140
model_8_cv3_0_m_1_cv1_conv1_conv                   cosine: 0.4282, scale: 9032870912.0000, zero_point: 118
model_8_cv3_0_m_1_cv1_conv2_conv                   cosine: 0.4345, scale: 3389645056.0000, zero_point: 124
float_functional_simple_56                         cosine: 0.4274, scale: 9035876352.0000, zero_point: 118
model_8_cv3_0_m_1_cv2_conv                         cosine: 0.3886, scale: 14086793216.0000, zero_point: 127
float_functional_simple_58                         cosine: 0.3954, scale: 14156201984.0000, zero_point: 130
model_8_cv3_0_m_2_cv1_conv1_conv                   cosine: 0.3991, scale: 15484750848.0000, zero_point: 127
model_8_cv3_0_m_2_cv1_conv2_conv                   cosine: 0.3528, scale: 3172747008.0000, zero_point: 123
float_functional_simple_59                         cosine: 0.3945, scale: 15511957504.0000, zero_point: 127
model_8_cv3_0_m_2_cv2_conv                         cosine: 0.4151, scale: 46409961472.0000, zero_point: 101
float_functional_simple_61                         cosine: 0.4234, scale: 46084993024.0000, zero_point: 98
model_8_cv3_0_cv2_conv                             cosine: 0.3949, scale: 2433656064.0000, zero_point: 96
float_functional_simple_62                         cosine: 0.4229, scale: 46084993024.0000, zero_point: 98
model_8_cv3_0_cv3_conv                             cosine: 0.4116, scale: 31605383168.0000, zero_point: 144
model_8_cv3_1_conv                                 cosine: 0.4392, scale: 55774707712.0000, zero_point: 105
float_functional_simple_63                         cosine: 0.4385, scale: 55774707712.0000, zero_point: 105
model_8_cv4_conv                                   cosine: 0.4451, scale: 66843471872.0000, zero_point: 124
model_9_cv1_conv                                   cosine: 0.4660, scale: 158518444032.0000, zero_point: 105
float_functional_simple_64                         cosine: 0.8381, scale: 158518444032.0000, zero_point: 105
model_9_cv5_conv                                   cosine: 0.7741, scale: 318444896256.0000, zero_point: 178
float_functional_simple_65                         cosine: 0.7696, scale: 318444896256.0000, zero_point: 178
model_12_cv1_conv                                  cosine: 0.7111, scale: 327496499200.0000, zero_point: 148
model_12_cv2_0_cv1_conv                            cosine: 0.6891, scale: 423746174976.0000, zero_point: 147
model_12_cv2_0_m_0_cv1_conv1_conv                  cosine: 0.6737, scale: 1293276807168.0000, zero_point: 130
model_12_cv2_0_m_0_cv1_conv2_conv                  cosine: 0.7251, scale: 497234018304.0000, zero_point: 132
float_functional_simple_66                         cosine: 0.6410, scale: 1401818447872.0000, zero_point: 135
model_12_cv2_0_m_0_cv2_conv                        cosine: 0.5738, scale: 1975745904640.0000, zero_point: 140
float_functional_simple_68                         cosine: 0.5591, scale: 2283707826176.0000, zero_point: 144
model_12_cv2_0_m_1_cv1_conv1_conv                  cosine: 0.5226, scale: 4685781008384.0000, zero_point: 113
model_12_cv2_0_m_1_cv1_conv2_conv                  cosine: 0.5209, scale: 1529911836672.0000, zero_point: 126
float_functional_simple_69                         cosine: 0.5243, scale: 5330226905088.0000, zero_point: 118
model_12_cv2_0_m_1_cv2_conv                        cosine: 0.5272, scale: 9179375861760.0000, zero_point: 140
float_functional_simple_71                         cosine: 0.5262, scale: 10145315684352.0000, zero_point: 139
model_12_cv2_0_m_2_cv1_conv1_conv                  cosine: 0.5608, scale: 12458676715520.0000, zero_point: 122
model_12_cv2_0_m_2_cv1_conv2_conv                  cosine: 0.5802, scale: 5105440522240.0000, zero_point: 125
float_functional_simple_72                         cosine: 0.5692, scale: 12754802966528.0000, zero_point: 121
model_12_cv2_0_m_2_cv2_conv                        cosine: 0.5751, scale: 35771192967168.0000, zero_point: 131
float_functional_simple_74                         cosine: 0.5686, scale: 38966178873344.0000, zero_point: 134
model_12_cv2_0_cv2_conv                            cosine: 0.6667, scale: 979391873024.0000, zero_point: 139
float_functional_simple_75                         cosine: 0.5686, scale: 38966178873344.0000, zero_point: 134
model_12_cv2_0_cv3_conv                            cosine: 0.5581, scale: 25821051879424.0000, zero_point: 109
model_12_cv2_1_conv                                cosine: 0.5349, scale: 52651580456960.0000, zero_point: 140
model_12_cv3_0_cv1_conv                            cosine: 0.3274, scale: 81670002704384.0000, zero_point: 127
model_12_cv3_0_m_0_cv1_conv1_conv                  cosine: 0.0161, scale: 221015359619072.0000, zero_point: 123
model_12_cv3_0_m_0_cv1_conv2_conv                  cosine: 0.4828, scale: 113051852341248.0000, zero_point: 109
float_functional_simple_76                         cosine: 0.0155, scale: 251247919628288.0000, zero_point: 126
model_12_cv3_0_m_0_cv2_conv                        cosine: 0.0000, scale: 327260099837952.0000, zero_point: 134
float_functional_simple_78                         cosine: 0.0000, scale: 329898182836224.0000, zero_point: 136
model_12_cv3_0_m_1_cv1_conv1_conv                  cosine: 0.0000, scale: 558978853502976.0000, zero_point: 130
model_12_cv3_0_m_1_cv1_conv2_conv                  cosine: 0.0245, scale: 361690872938496.0000, zero_point: 124
float_functional_simple_79                         cosine: 0.0000, scale: 603790529003520.0000, zero_point: 122
model_12_cv3_0_m_1_cv2_conv                        cosine: 0.0000, scale: 1850991586574336.0000, zero_point: 119
float_functional_simple_81                         cosine: 0.0000, scale: 1629767753269248.0000, zero_point: 120
model_12_cv3_0_m_2_cv1_conv1_conv                  cosine: 0.0000, scale: 1766776941576192.0000, zero_point: 130
model_12_cv3_0_m_2_cv1_conv2_conv                  cosine: 0.0000, scale: 802321097293824.0000, zero_point: 128
float_functional_simple_82                         cosine: 0.0000, scale: 2272685702774784.0000, zero_point: 130
model_12_cv3_0_m_2_cv2_conv                        cosine: 0.0000, scale: 5140065999126528.0000, zero_point: 131
float_functional_simple_84                         cosine: 0.0000, scale: 5154227580043264.0000, zero_point: 123
model_12_cv3_0_cv2_conv                            cosine: 0.0403, scale: 118396066725888.0000, zero_point: 122
float_functional_simple_85                         cosine: 0.0000, scale: 5154227580043264.0000, zero_point: 123
model_12_cv3_0_cv3_conv                            cosine: 0.0000, scale: 2061035485790208.0000, zero_point: 134
model_12_cv3_1_conv                                cosine: 0.0000, scale: 4401042519228416.0000, zero_point: 125
float_functional_simple_86                         cosine: 0.0000, scale: 4401042519228416.0000, zero_point: 125
model_12_cv4_conv                                  cosine: 0.0000, scale: 4712835401646080.0000, zero_point: 132
float_functional_simple_87                         cosine: 0.0000, scale: 4712835401646080.0000, zero_point: 132
model_15_cv1_conv                                  cosine: 0.0000, scale: 7790584806768640.0000, zero_point: 134
model_15_cv2_0_cv1_conv                            cosine: 0.0000, scale: 2894192971874304.0000, zero_point: 133
model_15_cv2_0_m_0_cv1_conv1_conv                  cosine: 0.0000, scale: 8857707012423680.0000, zero_point: 115
model_15_cv2_0_m_0_cv1_conv2_conv                  cosine: 0.0000, scale: 3661763488776192.0000, zero_point: 128
float_functional_simple_88                         cosine: 0.0000, scale: 11063390151639040.0000, zero_point: 114
model_15_cv2_0_m_0_cv2_conv                        cosine: 0.0000, scale: 9993857395589120.0000, zero_point: 127
float_functional_simple_90                         cosine: 0.0000, scale: 10558379307040768.0000, zero_point: 138
model_15_cv2_0_m_1_cv1_conv1_conv                  cosine: 0.0000, scale: 15149634921955328.0000, zero_point: 125
model_15_cv2_0_m_1_cv1_conv2_conv                  cosine: 0.0000, scale: 8737355586338816.0000, zero_point: 118
float_functional_simple_91                         cosine: 0.0000, scale: 18300065374273536.0000, zero_point: 127
model_15_cv2_0_m_1_cv2_conv                        cosine: 0.0000, scale: 34433695557353472.0000, zero_point: 132
float_functional_simple_93                         cosine: 0.0000, scale: 41204799546327040.0000, zero_point: 132
model_15_cv2_0_m_2_cv1_conv1_conv                  cosine: 0.0000, scale: 32712203945639936.0000, zero_point: 125
model_15_cv2_0_m_2_cv1_conv2_conv                  cosine: 0.0000, scale: 13206889143730176.0000, zero_point: 138
float_functional_simple_94                         cosine: 0.0000, scale: 35771629409665024.0000, zero_point: 126
model_15_cv2_0_m_2_cv2_conv                        cosine: 0.0000, scale: 108081134016921600.0000, zero_point: 118
float_functional_simple_96                         cosine: 0.0000, scale: 116138527023955968.0000, zero_point: 117
model_15_cv2_0_cv2_conv                            cosine: 0.0000, scale: 17921309388308480.0000, zero_point: 118
float_functional_simple_97                         cosine: 0.0000, scale: 116138527023955968.0000, zero_point: 117
model_15_cv2_0_cv3_conv                            cosine: 0.0000, scale: 53938948082237440.0000, zero_point: 132
model_15_cv2_1_conv                                cosine: 0.0000, scale: 121271923475742720.0000, zero_point: 127
model_15_cv3_0_cv1_conv                            cosine: 0.0000, scale: 132521344767098880.0000, zero_point: 116
model_15_cv3_0_m_0_cv1_conv1_conv                  cosine: 0.0000, scale: 318617544648818688.0000, zero_point: 129
model_15_cv3_0_m_0_cv1_conv2_conv                  cosine: 0.0000, scale: 135602631614660608.0000, zero_point: 126
float_functional_simple_98                         cosine: 0.0000, scale: 380875638224977920.0000, zero_point: 131
model_15_cv3_0_m_0_cv2_conv                        cosine: 0.0000, scale: 1154824072039759872.0000, zero_point: 130
float_functional_simple_100                        cosine: 0.0000, scale: 1130430032188014592.0000, zero_point: 128
model_15_cv3_0_m_1_cv1_conv1_conv                  cosine: 0.0000, scale: 1325279160670617600.0000, zero_point: 127
model_15_cv3_0_m_1_cv1_conv2_conv                  cosine: 0.0000, scale: 253013035893915648.0000, zero_point: 126
float_functional_simple_101                        cosine: 0.0000, scale: 1322760591848243200.0000, zero_point: 127
model_15_cv3_0_m_1_cv2_conv                        cosine: 0.0000, scale: 4020607259032682496.0000, zero_point: 127
float_functional_simple_103                        cosine: 0.0000, scale: 4411907779317465088.0000, zero_point: 126
model_15_cv3_0_m_2_cv1_conv1_conv                  cosine: 0.0000, scale: 4614107692787564544.0000, zero_point: 131
model_15_cv3_0_m_2_cv1_conv2_conv                  cosine: 0.0000, scale: 889222813418782720.0000, zero_point: 129
float_functional_simple_104                        cosine: 0.0000, scale: 4627138554844151808.0000, zero_point: 131
model_15_cv3_0_m_2_cv2_conv                        cosine: 0.0000, scale: 9725772984446091264.0000, zero_point: 122
float_functional_simple_106                        cosine: 0.0000, scale: 12645504121772703744.0000, zero_point: 126
model_15_cv3_0_cv2_conv                            cosine: 0.0000, scale: 213274916139565056.0000, zero_point: 125
float_functional_simple_107                        cosine: 0.0000, scale: 12645504121772703744.0000, zero_point: 126
model_15_cv3_0_cv3_conv                            cosine: 0.0000, scale: 6608515132945334272.0000, zero_point: 125
model_15_cv3_1_conv                                cosine: 0.0000, scale: 31264131549717594112.0000, zero_point: 131
float_functional_simple_108                        cosine: 0.0000, scale: 31264131549717594112.0000, zero_point: 131
model_15_cv4_conv                                  cosine: 0.0000, scale: 36054628945146937344.0000, zero_point: 129
model_16_cv1_conv                                  cosine: 0.0000, scale: 59118774298725056512.0000, zero_point: 133
float_functional_simple_109                        cosine: 0.0000, scale: 59118774298725056512.0000, zero_point: 133
model_18_cv1_conv                                  cosine: 0.0000, scale: 101368137516668944384.0000, zero_point: 131
model_18_cv2_0_cv1_conv                            cosine: 0.0000, scale: 178214720420072914944.0000, zero_point: 128
model_18_cv2_0_m_0_cv1_conv1_conv                  cosine: 0.0000, scale: 378624024228757766144.0000, zero_point: 123
model_18_cv2_0_m_0_cv1_conv2_conv                  cosine: 0.0000, scale: 99793126692298424320.0000, zero_point: 126
float_functional_simple_110                        cosine: 0.0000, scale: 403095634625842642944.0000, zero_point: 127
model_18_cv2_0_m_0_cv2_conv                        cosine: 0.0000, scale: 565996392278623518720.0000, zero_point: 123
float_functional_simple_112                        cosine: 0.0000, scale: 607537137844651819008.0000, zero_point: 123
model_18_cv2_0_m_1_cv1_conv1_conv                  cosine: 0.0000, scale: 1237644346696847589376.0000, zero_point: 125
model_18_cv2_0_m_1_cv1_conv2_conv                  cosine: 0.0000, scale: 219620745025373077504.0000, zero_point: 123
float_functional_simple_113                        cosine: 0.0000, scale: 1348393772508399009792.0000, zero_point: 115
model_18_cv2_0_m_1_cv2_conv                        cosine: 0.0000, scale: 3188317163747455008768.0000, zero_point: 119
float_functional_simple_115                        cosine: 0.0000, scale: 3776404521438888853504.0000, zero_point: 120
model_18_cv2_0_m_2_cv1_conv1_conv                  cosine: 0.0000, scale: 4260443809564298575872.0000, zero_point: 127
model_18_cv2_0_m_2_cv1_conv2_conv                  cosine: 0.0000, scale: 812290620890772668416.0000, zero_point: 127
float_functional_simple_116                        cosine: 0.0000, scale: 4405441984642095775744.0000, zero_point: 130
model_18_cv2_0_m_2_cv2_conv                        cosine: 0.0000, scale: 18017782964601952927744.0000, zero_point: 128
float_functional_simple_118                        cosine: 0.0000, scale: 19510323168908622692352.0000, zero_point: 129
model_18_cv2_0_cv2_conv                            cosine: 0.0000, scale: 183261285277518266368.0000, zero_point: 136
float_functional_simple_119                        cosine: 0.0000, scale: 19510323168908622692352.0000, zero_point: 129
model_18_cv2_0_cv3_conv                            cosine: 0.0000, scale: 17114822498313236905984.0000, zero_point: 128
model_18_cv2_1_conv                                cosine: 0.0000, scale: 45690806148725748006912.0000, zero_point: 130
model_18_cv3_0_cv1_conv                            cosine: 0.0000, scale: 61733190971372876070912.0000, zero_point: 132
model_18_cv3_0_m_0_cv1_conv1_conv                  cosine: 0.0000, scale: 165403577413646640218112.0000, zero_point: 123
model_18_cv3_0_m_0_cv1_conv2_conv                  cosine: 0.0000, scale: 17962825538349150765056.0000, zero_point: 126
float_functional_simple_120                        cosine: 0.0000, scale: 165370448934787702849536.0000, zero_point: 123
model_18_cv3_0_m_0_cv2_conv                        cosine: 0.0000, scale: 316705303530085688541184.0000, zero_point: 124
float_functional_simple_122                        cosine: 0.0000, scale: 329989913624512120750080.0000, zero_point: 123
model_18_cv3_0_m_1_cv1_conv1_conv                  cosine: 0.0000, scale: 460037693933110530408448.0000, zero_point: 127
model_18_cv3_0_m_1_cv1_conv2_conv                  cosine: 0.0000, scale: 73863325819321018482688.0000, zero_point: 112
float_functional_simple_123                        cosine: 0.0000, scale: 460357701708232968372224.0000, zero_point: 127
model_18_cv3_0_m_1_cv2_conv                        cosine: 0.0000, scale: 1599982814804028310945792.0000, zero_point: 121
float_functional_simple_125                        cosine: 0.0000, scale: 1793556029584508704522240.0000, zero_point: 121
model_18_cv3_0_m_2_cv1_conv1_conv                  cosine: 0.0000, scale: 1452156372404842998530048.0000, zero_point: 127
model_18_cv3_0_m_2_cv1_conv2_conv                  cosine: 0.0000, scale: 374769853237803303829504.0000, zero_point: 126
float_functional_simple_126                        cosine: 0.0000, scale: 1445416969749663674531840.0000, zero_point: 127
model_18_cv3_0_m_2_cv2_conv                        cosine: 0.0000, scale: 4807936316941565396254720.0000, zero_point: 125
float_functional_simple_128                        cosine: 0.0000, scale: 4696195164715069787340800.0000, zero_point: 124
model_18_cv3_0_cv2_conv                            cosine: 0.0000, scale: 140433162114108066103296.0000, zero_point: 125
float_functional_simple_129                        cosine: 0.0000, scale: 4696195164715069787340800.0000, zero_point: 124
model_18_cv3_0_cv3_conv                            cosine: 0.0000, scale: 2618150998067615088246784.0000, zero_point: 123
model_18_cv3_1_conv                                cosine: 0.0000, scale: 10817912053350743270227968.0000, zero_point: 134
float_functional_simple_130                        cosine: 0.0000, scale: 10817912053350743270227968.0000, zero_point: 134
model_18_cv4_conv                                  cosine: 0.0000, scale: 18477437150005195096195072.0000, zero_point: 139
model_19_cv1_conv                                  cosine: 0.0000, scale: 87294558525951841360936960.0000, zero_point: 117
float_functional_simple_131                        cosine: 0.0000, scale: 87294558525951841360936960.0000, zero_point: 117
model_21_cv1_conv                                  cosine: 0.0000, scale: 137274812555303080343633920.0000, zero_point: 121
model_21_cv2_0_cv1_conv                            cosine: 0.0000, scale: 142863244449061352150925312.0000, zero_point: 136
model_21_cv2_0_m_0_cv1_conv1_conv                  cosine: 0.0000, scale: 363938433267874210588393472.0000, zero_point: 116
model_21_cv2_0_m_0_cv1_conv2_conv                  cosine: 0.0000, scale: 210770413304125007585083392.0000, zero_point: 144
float_functional_simple_132                        cosine: 0.0000, scale: 373322919167468765620207616.0000, zero_point: 115
model_21_cv2_0_m_0_cv2_conv                        cosine: 0.0000, scale: 759675363970008946252447744.0000, zero_point: 119
float_functional_simple_134                        cosine: 0.0000, scale: 740523290189897036220858368.0000, zero_point: 117
model_21_cv2_0_m_1_cv1_conv1_conv                  cosine: 0.0000, scale: 1002702141343048023693852672.0000, zero_point: 135
model_21_cv2_0_m_1_cv1_conv2_conv                  cosine: 0.0000, scale: 487554344950427016583184384.0000, zero_point: 139
float_functional_simple_135                        cosine: 0.0000, scale: 1003498081456340443426979840.0000, zero_point: 135
model_21_cv2_0_m_1_cv2_conv                        cosine: 0.0000, scale: 2785394308034718374452789248.0000, zero_point: 146
float_functional_simple_137                        cosine: 0.0000, scale: 2747896947273302315339087872.0000, zero_point: 148
model_21_cv2_0_m_2_cv1_conv1_conv                  cosine: 0.0000, scale: 3496182184647558561850720256.0000, zero_point: 128
model_21_cv2_0_m_2_cv1_conv2_conv                  cosine: 0.0000, scale: 487090999632783580065693696.0000, zero_point: 135
float_functional_simple_138                        cosine: 0.0000, scale: 3499388376341521871597993984.0000, zero_point: 128
model_21_cv2_0_m_2_cv2_conv                        cosine: 0.0000, scale: 8417948821368943468578078720.0000, zero_point: 135
float_functional_simple_140                        cosine: 0.0000, scale: 9330584513411175721992192000.0000, zero_point: 127
model_21_cv2_0_cv2_conv                            cosine: 0.0000, scale: 286723941243676327176830976.0000, zero_point: 126
float_functional_simple_141                        cosine: 0.0000, scale: 9330584513411175721992192000.0000, zero_point: 127
model_21_cv2_0_cv3_conv                            cosine: 0.0000, scale: 7414947104902138401765982208.0000, zero_point: 127
model_21_cv2_1_conv                                cosine: 0.0000, scale: 12430662861616973465197215744.0000, zero_point: 134
model_21_cv3_0_cv1_conv                            cosine: 0.0000, scale: 18791332794781308192420790272.0000, zero_point: 127
model_21_cv3_0_m_0_cv1_conv1_conv                  cosine: 0.0000, scale: 66753058642268698614651944960.0000, zero_point: 128
model_21_cv3_0_m_0_cv1_conv2_conv                  cosine: 0.0000, scale: 23786392330204261068981665792.0000, zero_point: 133
float_functional_simple_142                        cosine: 0.0000, scale: 66753256981660979139750920192.0000, zero_point: 128
model_21_cv3_0_m_0_cv2_conv                        cosine: 0.0000, scale: 90582771601403564374001451008.0000, zero_point: 127
float_functional_simple_144                        cosine: 0.0000, scale: 96172599028244500456291696640.0000, zero_point: 125
model_21_cv3_0_m_1_cv1_conv1_conv                  cosine: 0.0000, scale: 179593731853182864556495470592.0000, zero_point: 123
model_21_cv3_0_m_1_cv1_conv2_conv                  cosine: 0.0000, scale: 62697863373732394007101702144.0000, zero_point: 119
float_functional_simple_145                        cosine: 0.0000, scale: 179584117115023741958840385536.0000, zero_point: 123
model_21_cv3_0_m_1_cv2_conv                        cosine: 0.0000, scale: 290966174100908029104948772864.0000, zero_point: 116
float_functional_simple_147                        cosine: 0.0000, scale: 297869008304646041342455054336.0000, zero_point: 116
model_21_cv3_0_m_2_cv1_conv1_conv                  cosine: 0.0000, scale: 430971401033829946863149645824.0000, zero_point: 120
model_21_cv3_0_m_2_cv1_conv2_conv                  cosine: 0.0000, scale: 154835582754049895069552476160.0000, zero_point: 157
float_functional_simple_148                        cosine: 0.0000, scale: 431009897765398300210931695616.0000, zero_point: 120
model_21_cv3_0_m_2_cv2_conv                        cosine: 0.0000, scale: 1258485960263322075466225942528.0000, zero_point: 109
float_functional_simple_150                        cosine: 0.0000, scale: 1292403356384838933848812158976.0000, zero_point: 108
model_21_cv3_0_cv2_conv                            cosine: 0.0000, scale: 30129460822895319911082491904.0000, zero_point: 124
float_functional_simple_151                        cosine: 0.0000, scale: 1292403356384838933848812158976.0000, zero_point: 108
model_21_cv3_0_cv3_conv                            cosine: 0.0000, scale: 723566130440020401816493621248.0000, zero_point: 152
model_21_cv3_1_conv                                cosine: 0.0000, scale: 3647814581493911493702177521664.0000, zero_point: 101
float_functional_simple_152                        cosine: 0.0000, scale: 3647814581493911493702177521664.0000, zero_point: 101
model_21_cv4_conv                                  cosine: 0.0000, scale: 7180271043428555842295542317056.0000, zero_point: 145
model_22_cv2_0_0_conv                              cosine: 0.0000, scale: 162421001147163607040.0000, zero_point: 128
model_22_cv2_0_1_conv                              cosine: 0.0000, scale: 461638981713194385408.0000, zero_point: 122
model_22_cv2_0_2                                   cosine: 0.0000, scale: 611586366490867138560.0000, zero_point: 131
model_22_cv3_0_0_conv                              cosine: 0.0000, scale: 66871263660406210560.0000, zero_point: 125
model_22_cv3_0_1_conv                              cosine: 0.0000, scale: 1568169715799055925248.0000, zero_point: 124
model_22_cv3_0_2                                   cosine: 0.0000, scale: 648902841159538180096.0000, zero_point: 131
float_functional_simple_153                        cosine: 0.0000, scale: 648902841159538180096.0000, zero_point: 131
model_22_cv2_1_0_conv                              cosine: 0.0000, scale: 42065590862491805741481984.0000, zero_point: 118
model_22_cv2_1_1_conv                              cosine: 0.0000, scale: 99290835908594600113078272.0000, zero_point: 120
model_22_cv2_1_2                                   cosine: 0.0000, scale: 92838782797537468227780608.0000, zero_point: 130
model_22_cv3_1_0_conv                              cosine: 0.0000, scale: 41888912559439834083491840.0000, zero_point: 133
model_22_cv3_1_1_conv                              cosine: 0.0000, scale: 667460754428321551267921920.0000, zero_point: 136
model_22_cv3_1_2                                   cosine: 0.0000, scale: 268275998131601984439975936.0000, zero_point: 122
float_functional_simple_154                        cosine: 0.0000, scale: 268275998131601984439975936.0000, zero_point: 122
model_22_cv2_2_0_conv                              cosine: 0.0000, scale: 20763627361852552025455743467520.0000, zero_point: 101
model_22_cv2_2_1_conv                              cosine: 0.0000, scale: 35762595920493231689434111410176.0000, zero_point: 127
model_22_cv2_2_2                                   cosine: 0.0000, scale: 61471551954126954987276914917376.0000, zero_point: 102
model_22_cv3_2_0_conv                              cosine: 0.0000, scale: 13487444979599049962955711971328.0000, zero_point: 129
model_22_cv3_2_1_conv                              cosine: 0.0000, scale: 125756815889264339417965369753600.0000, zero_point: 0
model_22_cv3_2_2                                   cosine: 0.0000, scale: 72900446510566951693947173339136.0000, zero_point: 253
float_functional_simple_155                        cosine: 0.0000, scale: 109299969834827429222591932399616.0000, zero_point: 169
float_functional_simple_156                        cosine: 0.0000, scale: 109299969834827429222591932399616.0000, zero_point: 169
fake_quant_inner_0_0_0                             cosine: 0.2485, scale: 0.0039, zero_point: 0
model_22_dfl_conv                                  cosine: 0.7441, scale: 0.0586, zero_point: 0
float_functional_simple_158                        cosine: 0.9634, scale: 0.1898, zero_point: 47
float_functional_simple_159                        cosine: 0.9806, scale: 0.1892, zero_point: 0
float_functional_simple_160                        cosine: 0.9953, scale: 0.3658, zero_point: 21
float_functional_simple_163                        cosine: 0.7758, scale: 0.1058, zero_point: 1
float_functional_simple_164                        cosine: 0.9237, scale: 0.1829, zero_point: 21
float_functional_simple_165                        cosine: 0.9040, scale: 3.6535, zero_point: 34
rewritten_sigmoid_0                                cosine: 0.6788, scale: 0.0039, zero_point: 0
float_functional_simple_166                        cosine: 0.9044, scale: 3.6535, zero_point: 34

WARNING (tinynn.util.quantization_analysis_util) Quantization error report:

Weights (cosine sorted 30):
model_21_cv3_0_m_1_cv1_conv2_conv        cosine: 0.2979, scale: 0.0061, zero_point: 0
model_18_cv3_0_m_1_cv1_conv2_conv        cosine: 0.3095, scale: 0.0091, zero_point: 0
model_8_cv3_0_m_0_cv1_conv2_conv         cosine: 0.3908, scale: 0.0091, zero_point: 0
model_21_cv3_0_m_0_cv1_conv2_conv        cosine: 0.4150, scale: 0.0064, zero_point: 0
model_8_cv3_0_m_1_cv1_conv2_conv         cosine: 0.4283, scale: 0.0060, zero_point: 0
model_21_cv3_0_m_2_cv1_conv2_conv        cosine: 0.4421, scale: 0.0032, zero_point: 0
model_21_cv2_0_m_2_cv1_conv2_conv        cosine: 0.5088, scale: 0.0031, zero_point: 0
model_8_cv2_0_m_1_cv1_conv2_conv         cosine: 0.5227, scale: 0.0036, zero_point: 0
model_21_cv2_0_m_1_cv1_conv2_conv        cosine: 0.5453, scale: 0.0024, zero_point: 0
model_8_cv3_0_m_2_cv1_conv2_conv         cosine: 0.6227, scale: 0.0034, zero_point: 0
model_18_cv3_0_m_2_cv1_conv2_conv        cosine: 0.6550, scale: 0.0025, zero_point: 0
model_12_cv3_0_m_2_cv1_conv2_conv        cosine: 0.6577, scale: 0.0074, zero_point: 0
model_22_cv3_1_1_conv                    cosine: 0.6880, scale: 0.1094, zero_point: 0
model_21_cv2_0_m_0_cv1_conv2_conv        cosine: 0.7153, scale: 0.0080, zero_point: 0
model_6_cv3_0_m_1_cv1_conv2_conv         cosine: 0.7161, scale: 0.0044, zero_point: 0
model_8_cv2_0_m_2_cv1_conv2_conv         cosine: 0.7401, scale: 0.0021, zero_point: 0
model_12_cv3_0_m_1_cv1_conv2_conv        cosine: 0.7669, scale: 0.0079, zero_point: 0
model_12_cv3_0_m_0_cv1_conv2_conv        cosine: 0.7683, scale: 0.0066, zero_point: 0
model_22_cv3_2_1_conv                    cosine: 0.7720, scale: 0.0789, zero_point: 0
model_18_cv3_0_m_0_cv1_conv2_conv        cosine: 0.7738, scale: 0.0051, zero_point: 0
model_6_cv3_0_m_2_cv1_conv2_conv         cosine: 0.7914, scale: 0.0029, zero_point: 0
model_22_cv3_0_1_conv                    cosine: 0.8338, scale: 0.1066, zero_point: 0
model_8_cv2_0_m_0_cv1_conv2_conv         cosine: 0.8352, scale: 0.0057, zero_point: 0
model_6_cv2_0_m_0_cv1_conv2_conv         cosine: 0.8745, scale: 0.0060, zero_point: 0
model_18_cv2_0_m_0_cv1_conv2_conv        cosine: 0.8828, scale: 0.0039, zero_point: 0
model_18_cv2_0_m_1_cv1_conv2_conv        cosine: 0.9013, scale: 0.0022, zero_point: 0
model_6_cv3_0_m_0_cv1_conv2_conv         cosine: 0.9020, scale: 0.0035, zero_point: 0
model_6_cv2_0_m_1_cv1_conv2_conv         cosine: 0.9155, scale: 0.0044, zero_point: 0
model_15_cv3_0_m_2_cv1_conv2_conv        cosine: 0.9271, scale: 0.0021, zero_point: 0
model_18_cv2_0_m_2_cv1_conv2_conv        cosine: 0.9299, scale: 0.0015, zero_point: 0

Activations (cosine sorted 30):
float_functional_simple_156                        cosine: 0.0000, scale: 109299969834827429222591932399616.0000, zero_point: 169
float_functional_simple_155                        cosine: 0.0000, scale: 109299969834827429222591932399616.0000, zero_point: 169
model_22_cv3_2_2                                   cosine: 0.0000, scale: 72900446510566951693947173339136.0000, zero_point: 253
model_22_cv3_2_1_conv                              cosine: 0.0000, scale: 125756815889264339417965369753600.0000, zero_point: 0
model_22_cv3_2_0_conv                              cosine: 0.0000, scale: 13487444979599049962955711971328.0000, zero_point: 129
model_22_cv2_2_2                                   cosine: 0.0000, scale: 61471551954126954987276914917376.0000, zero_point: 102
model_22_cv2_2_1_conv                              cosine: 0.0000, scale: 35762595920493231689434111410176.0000, zero_point: 127
model_22_cv2_2_0_conv                              cosine: 0.0000, scale: 20763627361852552025455743467520.0000, zero_point: 101
float_functional_simple_154                        cosine: 0.0000, scale: 268275998131601984439975936.0000, zero_point: 122
model_22_cv3_1_2                                   cosine: 0.0000, scale: 268275998131601984439975936.0000, zero_point: 122
model_22_cv3_1_1_conv                              cosine: 0.0000, scale: 667460754428321551267921920.0000, zero_point: 136
model_22_cv3_1_0_conv                              cosine: 0.0000, scale: 41888912559439834083491840.0000, zero_point: 133
model_22_cv2_1_2                                   cosine: 0.0000, scale: 92838782797537468227780608.0000, zero_point: 130
model_22_cv2_1_1_conv                              cosine: 0.0000, scale: 99290835908594600113078272.0000, zero_point: 120
model_22_cv2_1_0_conv                              cosine: 0.0000, scale: 42065590862491805741481984.0000, zero_point: 118
float_functional_simple_153                        cosine: 0.0000, scale: 648902841159538180096.0000, zero_point: 131
model_22_cv3_0_2                                   cosine: 0.0000, scale: 648902841159538180096.0000, zero_point: 131
model_22_cv3_0_1_conv                              cosine: 0.0000, scale: 1568169715799055925248.0000, zero_point: 124
model_22_cv3_0_0_conv                              cosine: 0.0000, scale: 66871263660406210560.0000, zero_point: 125
model_22_cv2_0_2                                   cosine: 0.0000, scale: 611586366490867138560.0000, zero_point: 131
model_22_cv2_0_1_conv                              cosine: 0.0000, scale: 461638981713194385408.0000, zero_point: 122
model_22_cv2_0_0_conv                              cosine: 0.0000, scale: 162421001147163607040.0000, zero_point: 128
model_21_cv4_conv                                  cosine: 0.0000, scale: 7180271043428555842295542317056.0000, zero_point: 145
float_functional_simple_152                        cosine: 0.0000, scale: 3647814581493911493702177521664.0000, zero_point: 101
model_21_cv3_1_conv                                cosine: 0.0000, scale: 3647814581493911493702177521664.0000, zero_point: 101
model_21_cv3_0_cv3_conv                            cosine: 0.0000, scale: 723566130440020401816493621248.0000, zero_point: 152
float_functional_simple_151                        cosine: 0.0000, scale: 1292403356384838933848812158976.0000, zero_point: 108
model_21_cv3_0_cv2_conv                            cosine: 0.0000, scale: 30129460822895319911082491904.0000, zero_point: 124
float_functional_simple_150                        cosine: 0.0000, scale: 1292403356384838933848812158976.0000, zero_point: 108
model_21_cv3_0_m_2_cv2_conv                        cosine: 0.0000, scale: 1258485960263322075466225942528.0000, zero_point: 109
zk1998 commented 1 month ago

Hi @hoangtv2000 , it looks like the qat_model can not infer correctly. You need to make sure that qat_model can perform inference correctly.

I suggest you check whether the model can perform the segmentation task normally after each step of operation. For example, test rewrite_model after generating traced model, and test qat_model after quantizer.quantize() with fake_quant and observer disabled.

BTW, you just use imgs = imgs.to(device, non_blocking=True).float() / 255.0 to process the img, maybe it may cause some error, you need to ensure that the image preprocessing pipeline is consistent with the pre-trained model.

hoangtv2000 commented 1 month ago

I the problem come from some illegal modifications in my code. Thanks for supporting me.