Inference Taking too Long

haris419 commented 2 years ago

I am doing inference on GPU even then the inference of a single statement is taking around 65 seconds. How can we reduce the inference time?

PrithivirajDamodaran commented 2 years ago

No, it doesn't. See below.

Time shown at the bottom is for paraphrasing 2 sentences into 10 paraphrases.

ioana-blue commented 2 years ago

Just started using Parrot and I notice a few issues. Piggybacking on this issue to see if we can figure out the fixes.

I'm using the sample phrases from the huggingface https://huggingface.co/prithivida/parrot_paraphraser_on_T5

It takes 26.9s and I think it's using the CPU despite the fact that I initialize the model like this:

parrot = Parrot(model_tag="prithivida/parrot_paraphraser_on_T5", use_gpu=True)

If I try to use use_gpu = True when I all parrot.augment, I get an error as follows:

RuntimeError                              Traceback (most recent call last)
File <timed exec>:5, in <module>

File /dccstor/redrug_ier/envs/tr-crt/lib/python3.8/site-packages/parrot/parrot.py:130, in Parrot.augment(self, input_phrase, use_gpu, diversity_ranker, do_diverse, max_return_phrases, max_length, adequacy_threshold, fluency_threshold)
    126   gen_pp = re.sub('[^a-zA-Z0-9 \?\'\-]', '', gen_pp)
    127   paraphrases.add(gen_pp)
--> 130 adequacy_filtered_phrases = self.adequacy_score.filter(input_phrase, paraphrases, adequacy_threshold, device )
    131 if len(adequacy_filtered_phrases) > 0 :
    132   fluency_filtered_phrases = self.fluency_score.filter(adequacy_filtered_phrases, fluency_threshold, device )

File /dccstor/redrug_ier/envs/tr-crt/lib/python3.8/site-packages/parrot/filters.py:15, in Adequacy.filter(self, input_phrase, para_phrases, adequacy_threshold, device)
     13 x = self.tokenizer(input_phrase, para_phrase, return_tensors='pt', max_length=128, truncation=True)
     14 self.adequacy_model = self.adequacy_model.to(device)
---> 15 logits = self.adequacy_model(**x).logits
     16 probs = logits.softmax(dim=1)
     17 prob_label_is_true = probs[:,1]

File /dccstor/redrug_ier/envs/tr-crt/lib/python3.8/site-packages/torch/nn/modules/module.py:889, in Module._call_impl(self, *input, **kwargs)
    887     result = self._slow_forward(*input, **kwargs)
    888 else:
--> 889     result = self.forward(*input, **kwargs)
    890 for hook in itertools.chain(
    891         _global_forward_hooks.values(),
    892         self._forward_hooks.values()):
    893     hook_result = hook(self, input, result)

File /dccstor/redrug_ier/envs/tr-crt/lib/python3.8/site-packages/transformers/models/roberta/modeling_roberta.py:1206, in RobertaForSequenceClassification.forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, labels, output_attentions, output_hidden_states, return_dict)
   1198 r"""
   1199 labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
   1200     Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
   1201     config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
   1202     `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
   1203 """
   1204 return_dict = return_dict if return_dict is not None else self.config.use_return_dict
-> 1206 outputs = self.roberta(
   1207     input_ids,
   1208     attention_mask=attention_mask,
   1209     token_type_ids=token_type_ids,
   1210     position_ids=position_ids,
   1211     head_mask=head_mask,
   1212     inputs_embeds=inputs_embeds,
   1213     output_attentions=output_attentions,
   1214     output_hidden_states=output_hidden_states,
   1215     return_dict=return_dict,
   1216 )
   1217 sequence_output = outputs[0]
   1218 logits = self.classifier(sequence_output)

File /dccstor/redrug_ier/envs/tr-crt/lib/python3.8/site-packages/torch/nn/modules/module.py:889, in Module._call_impl(self, *input, **kwargs)
    887     result = self._slow_forward(*input, **kwargs)
    888 else:
--> 889     result = self.forward(*input, **kwargs)
    890 for hook in itertools.chain(
    891         _global_forward_hooks.values(),
    892         self._forward_hooks.values()):
    893     hook_result = hook(self, input, result)

File /dccstor/redrug_ier/envs/tr-crt/lib/python3.8/site-packages/transformers/models/roberta/modeling_roberta.py:841, in RobertaModel.forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
    834 # Prepare head mask if needed
    835 # 1.0 in head_mask indicate we keep the head
    836 # attention_probs has shape bsz x n_heads x N x N
    837 # input head_mask has shape [num_heads] or [num_hidden_layers x num_heads]
    838 # and head_mask is converted to shape [num_hidden_layers x batch x num_heads x seq_length x seq_length]
    839 head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)
--> 841 embedding_output = self.embeddings(
    842     input_ids=input_ids,
    843     position_ids=position_ids,
    844     token_type_ids=token_type_ids,
    845     inputs_embeds=inputs_embeds,
    846     past_key_values_length=past_key_values_length,
    847 )
    848 encoder_outputs = self.encoder(
    849     embedding_output,
    850     attention_mask=extended_attention_mask,
   (...)
    858     return_dict=return_dict,
    859 )
    860 sequence_output = encoder_outputs[0]

File /dccstor/redrug_ier/envs/tr-crt/lib/python3.8/site-packages/torch/nn/modules/module.py:889, in Module._call_impl(self, *input, **kwargs)
    887     result = self._slow_forward(*input, **kwargs)
    888 else:
--> 889     result = self.forward(*input, **kwargs)
    890 for hook in itertools.chain(
    891         _global_forward_hooks.values(),
    892         self._forward_hooks.values()):
    893     hook_result = hook(self, input, result)

File /dccstor/redrug_ier/envs/tr-crt/lib/python3.8/site-packages/transformers/models/roberta/modeling_roberta.py:128, in RobertaEmbeddings.forward(self, input_ids, token_type_ids, position_ids, inputs_embeds, past_key_values_length)
    125         token_type_ids = torch.zeros(input_shape, dtype=torch.long, device=self.position_ids.device)
    127 if inputs_embeds is None:
--> 128     inputs_embeds = self.word_embeddings(input_ids)
    129 token_type_embeddings = self.token_type_embeddings(token_type_ids)
    131 embeddings = inputs_embeds + token_type_embeddings

File /dccstor/redrug_ier/envs/tr-crt/lib/python3.8/site-packages/torch/nn/modules/module.py:889, in Module._call_impl(self, *input, **kwargs)
    887     result = self._slow_forward(*input, **kwargs)
    888 else:
--> 889     result = self.forward(*input, **kwargs)
    890 for hook in itertools.chain(
    891         _global_forward_hooks.values(),
    892         self._forward_hooks.values()):
    893     hook_result = hook(self, input, result)

File /dccstor/redrug_ier/envs/tr-crt/lib/python3.8/site-packages/torch/nn/modules/sparse.py:156, in Embedding.forward(self, input)
    155 def forward(self, input: Tensor) -> Tensor:
--> 156     return F.embedding(
    157         input, self.weight, self.padding_idx, self.max_norm,
    158         self.norm_type, self.scale_grad_by_freq, self.sparse)

File /dccstor/redrug_ier/envs/tr-crt/lib/python3.8/site-packages/torch/nn/functional.py:1916, in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
   1910     # Note [embedding_renorm set_grad_enabled]
   1911     # XXX: equivalent to
   1912     # with torch.no_grad():
   1913     #   torch.embedding_renorm_
   1914     # remove once script supports set_grad_enabled
   1915     _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 1916 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)

RuntimeError: Input, output and indices must be on the current device

I'd appreciate your help with this! Thank you!

This is my invoking code:

%%time
for phrase in phrases:
    print("-"*100)
    print("Input_phrase: ", phrase)
    print("-"*100)
    para_phrases = parrot.augment(input_phrase=phrase,
                                  use_gpu = True,
                               # diversity_ranker="levenshtein",
                               do_diverse = False, 
                               max_return_phrases = 10, 
                               max_length=64, 
                               # adequacy_threshold = 0.99, 
                               # fluency_threshold = 0.90
                            )
    # print(para_phrases)
    for para_phrase in para_phrases:
        print(para_phrase)

PrithivirajDamodaran commented 2 years ago

Not able to reproduce this issue

PrithivirajDamodaran / Parrot_Paraphraser

Inference Taking too Long #15