PrithivirajDamodaran / Parrot_Paraphraser

A practical and feature-rich paraphrasing framework to augment human intents in text form to build robust NLU models for conversational engines. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration.
Apache License 2.0
866 stars 141 forks source link

Inference Taking too Long #15

Closed haris419 closed 2 years ago

haris419 commented 2 years ago

I am doing inference on GPU even then the inference of a single statement is taking around 65 seconds. How can we reduce the inference time?

PrithivirajDamodaran commented 2 years ago

No, it doesn't. See below.

Time shown at the bottom is for paraphrasing 2 sentences into 10 paraphrases.

Screenshot 2022-06-07 at 11 15 21 AM
ioana-blue commented 2 years ago

Just started using Parrot and I notice a few issues. Piggybacking on this issue to see if we can figure out the fixes.

I'm using the sample phrases from the huggingface https://huggingface.co/prithivida/parrot_paraphraser_on_T5

It takes 26.9s and I think it's using the CPU despite the fact that I initialize the model like this:

parrot = Parrot(model_tag="prithivida/parrot_paraphraser_on_T5", use_gpu=True)

If I try to use use_gpu = True when I all parrot.augment, I get an error as follows:

RuntimeError                              Traceback (most recent call last)
File <timed exec>:5, in <module>

File /dccstor/redrug_ier/envs/tr-crt/lib/python3.8/site-packages/parrot/parrot.py:130, in Parrot.augment(self, input_phrase, use_gpu, diversity_ranker, do_diverse, max_return_phrases, max_length, adequacy_threshold, fluency_threshold)
    126   gen_pp = re.sub('[^a-zA-Z0-9 \?\'\-]', '', gen_pp)
    127   paraphrases.add(gen_pp)
--> 130 adequacy_filtered_phrases = self.adequacy_score.filter(input_phrase, paraphrases, adequacy_threshold, device )
    131 if len(adequacy_filtered_phrases) > 0 :
    132   fluency_filtered_phrases = self.fluency_score.filter(adequacy_filtered_phrases, fluency_threshold, device )

File /dccstor/redrug_ier/envs/tr-crt/lib/python3.8/site-packages/parrot/filters.py:15, in Adequacy.filter(self, input_phrase, para_phrases, adequacy_threshold, device)
     13 x = self.tokenizer(input_phrase, para_phrase, return_tensors='pt', max_length=128, truncation=True)
     14 self.adequacy_model = self.adequacy_model.to(device)
---> 15 logits = self.adequacy_model(**x).logits
     16 probs = logits.softmax(dim=1)
     17 prob_label_is_true = probs[:,1]

File /dccstor/redrug_ier/envs/tr-crt/lib/python3.8/site-packages/torch/nn/modules/module.py:889, in Module._call_impl(self, *input, **kwargs)
    887     result = self._slow_forward(*input, **kwargs)
    888 else:
--> 889     result = self.forward(*input, **kwargs)
    890 for hook in itertools.chain(
    891         _global_forward_hooks.values(),
    892         self._forward_hooks.values()):
    893     hook_result = hook(self, input, result)

File /dccstor/redrug_ier/envs/tr-crt/lib/python3.8/site-packages/transformers/models/roberta/modeling_roberta.py:1206, in RobertaForSequenceClassification.forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, labels, output_attentions, output_hidden_states, return_dict)
   1198 r"""
   1199 labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
   1200     Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
   1201     config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
   1202     `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
   1203 """
   1204 return_dict = return_dict if return_dict is not None else self.config.use_return_dict
-> 1206 outputs = self.roberta(
   1207     input_ids,
   1208     attention_mask=attention_mask,
   1209     token_type_ids=token_type_ids,
   1210     position_ids=position_ids,
   1211     head_mask=head_mask,
   1212     inputs_embeds=inputs_embeds,
   1213     output_attentions=output_attentions,
   1214     output_hidden_states=output_hidden_states,
   1215     return_dict=return_dict,
   1216 )
   1217 sequence_output = outputs[0]
   1218 logits = self.classifier(sequence_output)

File /dccstor/redrug_ier/envs/tr-crt/lib/python3.8/site-packages/torch/nn/modules/module.py:889, in Module._call_impl(self, *input, **kwargs)
    887     result = self._slow_forward(*input, **kwargs)
    888 else:
--> 889     result = self.forward(*input, **kwargs)
    890 for hook in itertools.chain(
    891         _global_forward_hooks.values(),
    892         self._forward_hooks.values()):
    893     hook_result = hook(self, input, result)

File /dccstor/redrug_ier/envs/tr-crt/lib/python3.8/site-packages/transformers/models/roberta/modeling_roberta.py:841, in RobertaModel.forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
    834 # Prepare head mask if needed
    835 # 1.0 in head_mask indicate we keep the head
    836 # attention_probs has shape bsz x n_heads x N x N
    837 # input head_mask has shape [num_heads] or [num_hidden_layers x num_heads]
    838 # and head_mask is converted to shape [num_hidden_layers x batch x num_heads x seq_length x seq_length]
    839 head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)
--> 841 embedding_output = self.embeddings(
    842     input_ids=input_ids,
    843     position_ids=position_ids,
    844     token_type_ids=token_type_ids,
    845     inputs_embeds=inputs_embeds,
    846     past_key_values_length=past_key_values_length,
    847 )
    848 encoder_outputs = self.encoder(
    849     embedding_output,
    850     attention_mask=extended_attention_mask,
   (...)
    858     return_dict=return_dict,
    859 )
    860 sequence_output = encoder_outputs[0]

File /dccstor/redrug_ier/envs/tr-crt/lib/python3.8/site-packages/torch/nn/modules/module.py:889, in Module._call_impl(self, *input, **kwargs)
    887     result = self._slow_forward(*input, **kwargs)
    888 else:
--> 889     result = self.forward(*input, **kwargs)
    890 for hook in itertools.chain(
    891         _global_forward_hooks.values(),
    892         self._forward_hooks.values()):
    893     hook_result = hook(self, input, result)

File /dccstor/redrug_ier/envs/tr-crt/lib/python3.8/site-packages/transformers/models/roberta/modeling_roberta.py:128, in RobertaEmbeddings.forward(self, input_ids, token_type_ids, position_ids, inputs_embeds, past_key_values_length)
    125         token_type_ids = torch.zeros(input_shape, dtype=torch.long, device=self.position_ids.device)
    127 if inputs_embeds is None:
--> 128     inputs_embeds = self.word_embeddings(input_ids)
    129 token_type_embeddings = self.token_type_embeddings(token_type_ids)
    131 embeddings = inputs_embeds + token_type_embeddings

File /dccstor/redrug_ier/envs/tr-crt/lib/python3.8/site-packages/torch/nn/modules/module.py:889, in Module._call_impl(self, *input, **kwargs)
    887     result = self._slow_forward(*input, **kwargs)
    888 else:
--> 889     result = self.forward(*input, **kwargs)
    890 for hook in itertools.chain(
    891         _global_forward_hooks.values(),
    892         self._forward_hooks.values()):
    893     hook_result = hook(self, input, result)

File /dccstor/redrug_ier/envs/tr-crt/lib/python3.8/site-packages/torch/nn/modules/sparse.py:156, in Embedding.forward(self, input)
    155 def forward(self, input: Tensor) -> Tensor:
--> 156     return F.embedding(
    157         input, self.weight, self.padding_idx, self.max_norm,
    158         self.norm_type, self.scale_grad_by_freq, self.sparse)

File /dccstor/redrug_ier/envs/tr-crt/lib/python3.8/site-packages/torch/nn/functional.py:1916, in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
   1910     # Note [embedding_renorm set_grad_enabled]
   1911     # XXX: equivalent to
   1912     # with torch.no_grad():
   1913     #   torch.embedding_renorm_
   1914     # remove once script supports set_grad_enabled
   1915     _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 1916 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)

RuntimeError: Input, output and indices must be on the current device

I'd appreciate your help with this! Thank you!

This is my invoking code:

%%time
for phrase in phrases:
    print("-"*100)
    print("Input_phrase: ", phrase)
    print("-"*100)
    para_phrases = parrot.augment(input_phrase=phrase,
                                  use_gpu = True,
                               # diversity_ranker="levenshtein",
                               do_diverse = False, 
                               max_return_phrases = 10, 
                               max_length=64, 
                               # adequacy_threshold = 0.99, 
                               # fluency_threshold = 0.90
                            )
    # print(para_phrases)
    for para_phrase in para_phrases:
        print(para_phrase)
PrithivirajDamodaran commented 2 years ago

Not able to reproduce this issue