flairNLP / flair

A very simple framework for state-of-the-art Natural Language Processing (NLP)
https://flairnlp.github.io/flair/
Other
13.81k stars 2.09k forks source link

Flair | Model.predict takes sometime more time for same keyword #1552

Closed raputt closed 3 years ago

raputt commented 4 years ago

I have trained Flair Model for NER with my own corpus like below

ag_type = 'ner'

3. make the tag dictionary from the corpus

tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type)

4. initialize embeddings

embedding_types: List[TokenEmbeddings] = [

WordEmbeddings('glove'),

# comment in this line to use character embeddings
# CharacterEmbeddings(),

# comment in these lines to use flair embeddings
 #FlairEmbeddings('news-forward'),
# FlairEmbeddings('news-backward'),

]

embeddings: StackedEmbeddings = StackedEmbeddings(embeddings=embedding_types)

5. initialize sequence tagger

from flair.models import SequenceTagger

tagger: SequenceTagger = SequenceTagger(hidden_size=256, embeddings=embeddings, tag_dictionary=tag_dictionary, tag_type=tag_type, use_crf=True)

6. initialize trainer 1`

from flair.trainers import ModelTrainer

trainer: ModelTrainer = ModelTrainer(tagger, corpus)

7. start training

trainer.train('/abc/', learning_rate=0.1, mini_batch_size=32, max_epochs=2)

Once training is done, I am doing prediction by loading model like below

flair_NER_best_path = os.path.join(PATH_TO_NER_FLAIR_BASE_DIR+"/best-model.pt")

model = SequenceTagger.load(flair_NER_best_path)

def generateNERValue(self,keyword,listOfNERValues):
    keyword = self._preprocess_query(keyword)
    sentence = Sentence(self.stemwords(keyword))
     model.predict(sentence)
    y = sentence.to_dict(tag_type='ner')

I am doing online prediction. As soon as customers enters search term, that will passed to above method to get tagging .

ex: women black shoes

sometimes for same term , response is given in 2 ms ,sometimes 10 ms. We are facing this issue in
our perf environment and also cpu utilization is very high with flair model compared to keras model.

We are running this as REST API in flask service with below configuration

3 docker instance with 3 worker gunicorn -k gevent --threads 2000 --worker-connections 2000

want to understand why would model.predict take more time sometimes for same keyword?

raputt commented 4 years ago

We are deploying our code in cloud

djstrong commented 4 years ago

I think it is not Flair related. Maybe PyTorch or gunicorn. Does each thread load the model?

raputt commented 4 years ago

@djstrong : model is loaded as part of class initialization only like below

class Flair_Ner_Wrapper: def init(self, base_path_to_model,model_file_name='best-model.pt',stop_word_file_name='./picklefiles/stemdict.text',stem_list_file_name='./picklefiles/stopphrases.text' ): self.stopwordlist = self._load_stopwords(os.path.join(base_path_to_model, stop_word_file_name)) self.stemdict = self._load_stemdict(os.path.join(base_path_to_model, stem_list_file_name)) self.flair_model = self._load_model(os.path.join(base_path_to_model, model_file_name)) self.lemmatiser = WordNetLemmatizer()

When i did analyze it, i seee lot of GC is happening for Flair Model

I have trained my model on CPU machine and even prediction is happening on cpu system.Do i need to use embedding_storage_mode == "cpu" for prediction?

alanakbik commented 4 years ago

I haven't seen this behavior yet (and it hasn't been reported so far) so maybe it is specific to your cloud setup.

raputt commented 4 years ago

Please find the hotspot details. Looks like below code takes lot of time

forward at local/lib/python3.6/site-packages/flair/models/sequence_tagger_model.py:556 9.9% 99 samples
call at local/lib64/python3.6/site-packages/torch/nn/modules/module.py:550 9.8% 98 samples
forward at local/lib64/python3.6/site-packages/torch/nn/modules/rnn.py:573

Screen Shot 2020-05-19 at 8 37 37 PM Screen Shot 2020-05-19 at 8 38 46 PM
raputt commented 4 years ago

I think it is not Flair related. Maybe PyTorch or gunicorn. Does each thread load the model?

Each gunicorn worker loads the model.

djstrong commented 4 years ago

If some worker spawns for the new request then the time is increased by loading the model.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.