facebookresearch / fastText

Library for fast text representation and classification.
https://fasttext.cc/
MIT License
25.92k stars 4.72k forks source link

Quantized model artifacts are not accessible as class attributes #984

Open denver1117 opened 4 years ago

denver1117 commented 4 years ago

It does not appear that the model artifacts needed to explore quantized models are made public, specifically with the python module.

For instance, with a non-quantized supervised model class, you can access the input and output matrix explicitly as numpy objects:

model.get_input_matrix() model.get_output_matrix()

In this case you are able to access all of the model artifacts needed to manually get word and sentence vectors, perform the dot product for prediction, etc.

For quantized models, it is very opaque what has gone on under the hood and where the artifacts are. The input matrix usually accessible with model.get_input_matrix() goes away. This is understandable. But presumably some artifacts exist that map words to their nearest centroid, and that map centroids back into the full dimensional input matrix space. These seemingly must exist in order to produce word/sentence vectors for quantized models as part of the predict step.

Where are these attributes? Can they be accessed with the python module? I see no new attributes created in either the python model object or the pybind object model.f after quantization.

These would be much appreciated in efforts to further explore fasttext quantization. While I understand the methods (predict, get_word_vector, get_sentence_vector, etc.) function appropriately after quantization, it is very much a black box to the end user.

Celebio commented 4 years ago

Hi @denver1117 , Thank you for your suggestion, it makes sense. We will add it to our feature request list.

Best regards, Onur

denver1117 commented 4 years ago

Great thanks 💯