Slightly different scores when using a quantized model

Hi,

I tried using a quantized model for biencoder in fast model as follows:

with open(args.biencoder_config) as json_file:
    biencoder_params = json.load(json_file)
    biencoder_params["path_to_model"] = args.biencoder_model
    biencoder = load_biencoder(biencoder_params)

    # Quantize model
    biencoder = torch.quantization.quantize_dynamic(
        biencoder, {torch.nn.Linear}, dtype=torch.qint8
    )
    # Save quantized model for later use
    quantized_biencoder_model = args.biencoder_model.replace(
        ".bin", "_quantized.bin"
    )
    torch.save(biencoder.state_dict(), quantized_biencoder_model)

The resulting model size is significantly smaller on disk:

$ du -h biencoder_wiki_large*.bin
2.5G    biencoder_wiki_large.bin
824M    biencoder_wiki_large_quantized.bin

I fed the text from the following entity to the regular as well as the quantized model and I get slightly different scores for detected mentions:

{
  "text": " Aristotle (; \"Aristoteles\", ; 384–322 BC) was a Greek philosopher during the Classical period in Ancient Greece, the founder of the Lyceum and the Peripatetic school of philosophy and Aristotelian tradition. Along with his teacher Plato, he has been called the \"Father of Western Philosophy\". His writings cover many subjects – including physics, biology, zoology, metaphysics, logic, ethics, aesthetics, poetry, theatre, music, rhetoric, psychology, linguistics, economics, politics and government. Aristotle provided a complex synthesis of the various philosophies existing prior to him, and it was above all from his teachings that the West inherited its intellectual lexicon, as well as problems and methods of inquiry. As a result, his philosophy has exerted a unique influence on almost every form of knowledge in the West and it continues to be a subject of contemporary philosophical discussion.  Little is known about his life. Aristotle was born in the city of Stagira in Northern Greece. His father, Nicomachus, died when Aristotle was a child, and he was brought up by a guardian. At seventeen or eighteen years of age, he joined Plato's Academy in Athens and remained there until the age of thirty-seven (c. 347 BC). Shortly after Plato died, Aristotle left Athens and, at the request of Philip II of Macedon, tutored Alexander the Great beginning in 343 BC. He established a library in the Lyceum which helped him to produce many of his hundreds of books on papyrus scrolls. Though Aristotle wrote many elegant treatises and dialogues for publication, only around a third of his original",
  "idx": "https://en.wikipedia.org/wiki?curid=308",
  "title": "Aristotle",
  "entity": "Aristotle",
}

Output from the regular model:

Mention                Start    End  Predictions                                                                           Scores
-------------------  -------  -----  ------------------------------------------------------------------------------------  -------------------------------
aristotle                  1     10  ['Aristotle', 'Aristotle of Cyrene', 'Plato']                                         [81.83269 75.88555 75.54469]
aristoteles               15     26  ['Aristotle', 'Aristotle of Argos', 'Aristides']                                      [78.62669  76.48178  76.190445]
plato                    232    237  ['Plato', 'Socrates', 'Aristotle']                                                    [83.59171 77.50141 75.64921]
aristotle                501    510  ['Aristotle', 'Aristotle of Cyrene', 'Plato']                                         [82.56731 76.41449 76.34714]
little                   906    912  ['Alexander the Great in legend', 'Sophistic works of Antiphon', 'Nicias of Nicaea']  [75.37565 74.96681 74.91618]
aristotle                938    947  ['Aristotle', 'Aristotle of Cyrene', 'Aristotle the Dialectician']                    [82.74979 78.06471 77.26161]
aristotle               1034   1043  ['Aristotle', 'Aristotle of Cyrene', 'Aristotle of Argos']                            [79.39718 75.55984 75.48356]
plato                   1143   1148  ['Plato', 'Socrates', 'Plato (comic poet)']                                           [83.38141 77.72538 75.96175]
plato                   1245   1250  ['Plato', 'Socrates', 'Plato (comic poet)']                                           [82.8144  77.8433  76.76737]
aristotle               1257   1266  ['Aristotle', 'Aristotle of Argos', 'Aristotle the Dialectician']                     [80.94    76.36772 76.23486]
alexander the great     1332   1351  ['Alexander the Great', 'Alexander I of Epirus', 'Alexander I of Macedon']            [82.85829  76.547844 76.40962 ]
aristotle               1497   1506  ['Aristotle', 'Aristotle of Cyrene', 'Plato']                                         [82.41356 76.9357  76.83391]

Output from the quantized model:

Mention                Start    End  Predictions                                                                          Scores
-------------------  -------  -----  -----------------------------------------------------------------------------------  -------------------------------
aristotle                  1     10  ['Aristotle', 'Euclid', 'Plato']                                                     [78.56236  76.82237  76.486824]
aristoteles               15     26  ['Aristophanes of Byzantium', 'Diocles of Peparethus', 'Ephorus']                    [76.16426  76.133354 76.056305]
plato                    232    237  ['Plato', 'Socrates', 'Euclid']                                                      [76.05176  75.021736 74.655815]
aristotle                501    510  ['Aristotle', 'Plato', 'Euclid']                                                     [73.647415 72.97169  72.854645]
little                   906    912  ['Maluma', 'George Houghton (disambiguation)', 'John Lewis']                         [73.14746 73.02072 73.00591]
aristotle                938    947  ['Aristotle', 'Plato', 'Socrates']                                                   [77.42533  75.779884 75.77082 ]
aristotle               1034   1043  ['Aristotle', 'Alexander, son of Herod', 'Alexander (grandson of Herod the Great)']  [77.33661 77.07514 76.96219]
plato                   1143   1148  ['Plato', 'Socrates', 'Aristotle']                                                   [77.73229 76.45869 75.94086]
plato                   1245   1250  ['Plato', 'Ramesses II', 'Cyrus the Great']                                          [76.089645 75.32056  75.00227 ]
aristotle               1257   1266  ['Aristotle', 'Alexander the Great', 'Euclid']                                       [74.48347 74.2509  73.81971]
alexander the great     1332   1351  ['Alexander the Great', 'Cyrus the Great', 'Darius the Great']                       [85.78471  83.313095 82.84473 ]
aristotle               1497   1506  ['Aristotle', 'Euclid', 'Plato']                                                     [75.367905 74.52708  74.46407 ]

I was just wondering if the FB team or anybody else has any experience with compressing BLINK models so as to save on memory usage.

Thanks!

facebookresearch / BLINK

Slightly different scores when using a quantized model #107