Open ReySadeghi opened 4 years ago
Yes, you can also use them for BERT large.
Layers and dimension depends on what you need, i.e. you have a storage vs. performance trade-off (dimension) and a run-time vs. performance trade-off (layers).
i got this error in model_distilltaion.py , line "auto_model = student_model._first_module().model" error: torch.nn.modules.module.ModuleAttributeError: 'Transformer' object has no attribute 'model'
I loaded a finetuned BERT model use sentencetransformer().
I think it has to be
auto_model = student_model._first_module().auto_model
thanks.
in case of model_distillation.py, I have finetuned my distilled model for 10 epochs, and used "SequentialEvaluator" consists of "MSEEvaluator" and "BinaryClassificationEvaluater", so I want to know the best model is saved depend of which evaluater? as I understood "BinaryClassificationEvaluater", save the best model depend on "cosin_avg_percision" and "MSEEvaluator" as in each epoch the loss reduces so the best model save on every epoch. is it true?
You can pass a callable to SequentialEvaluator for the main_score_function parameter: https://github.com/UKPLab/sentence-transformers/blob/master/sentence_transformers/evaluation/SequentialEvaluator.py
By default, the score from the last evaluator is used to determine which model is saved. By setting main_score_function = lambda x: x[0] the score from the first evaluator would be used.
hi, can I use "knowledge distillation" and "dimension reduction" for Bert-large? and if it is possible, for knowledge distillation how many layers should be remained in option2 ? and for dimension reduction which new size do you recommend for Bert-Large? thanks.