Closed igormis closed 4 years ago
I forgot the part where I load the models:
# Load best model
best_model_dir = prepare_models()
recommended_preprocessors_kwargs = {
'LengthRatioPreprocessor': {'target_ratio': 0.95},
'LevenshteinPreprocessor': {'target_ratio': 0.75},
'WordRankRatioPreprocessor': {'target_ratio': 0.75},
'SentencePiecePreprocessor': {'vocab_size': 10000},
}
preprocessors = get_preprocessors(recommended_preprocessors_kwargs)
simplifier = get_fairseq_simplifier(best_model_dir, beam=8)
simplifier = get_preprocessed_simplifier(simplifier, preprocessors=preprocessors)
Two checks: 1) Make sure that you only load the model only once (and not at every request). 2) Make sure the model is loaded on GPU for faster inference
If you are already doing these two things, there is not many easy solutions to speed up the inference. You could batch requests together to process them all at once but it might not fit your use case.
I am using this code to simplify sentence by sentence:
And the transformation time is from 0.8 to 2.0 seconds is it possible to speed this? I need the response time to be bellow 0.3 seconds?