Vamsi995 / Paraphrase-Generator

A paraphrase generator built using the T5 model which produces paraphrased English sentences.
MIT License
310 stars 66 forks source link

Request: could you make a Colab using Paraphrase-Generator #6

Closed MastafaF closed 3 years ago

MastafaF commented 3 years ago

Hey,

Great stuff providing this paraphrase generator. I wonder though if you could quickly push a working Colab using this generator without streamlit.

In other words, could you show how to replicate the generation of paraphrase given a sentence with this repo on Google Colab?

I currently have a good number of issues, probably linked to the libraries. Would be great to simply replicate the great work done here on a Colab environment.

Thanks,

Vamsi995 commented 3 years ago

I think it's already there inside the Colab notebooks folder-> Paraphrase.ipynb. This was the notebook I used while training it. So you can find the inference part in this too.

MastafaF commented 3 years ago

I took a look at it but I am not sure I can replicate it as you save the pretrained model after fine-tuning it.

I pulled the git repo and did the following in Colab but it is not working:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("./T5_Paraphrase_Paws")
model = AutoModelForSeq2SeqLM.from_pretrained("./T5_Paraphrase_Paws")

sentence = "This is something which i cannot understand at all"
text =  "paraphrase: " + sentence + " </s>"
encoding = tokenizer.encode_plus(text,pad_to_max_length=True, return_tensors="pt")
input_ids, attention_masks = encoding["input_ids"].to("cuda"), encoding["attention_mask"].to("cuda")

outputs = model.generate(
    input_ids=input_ids, attention_mask=attention_masks,
    max_length=256,
    do_sample=True,
    top_k=200,
    top_p=0.95,
    early_stopping=True,
    num_return_sequences=5
)

for output in outputs:
    line = tokenizer.decode(output, skip_special_tokens=True,clean_up_tokenization_spaces=True)
    print(line)

Some guidance from this stage would be great.

Vamsi995 commented 3 years ago
pip install transformers
pip install sentencepiece

from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("t5-base")
model = T5ForConditionalGeneration.from_pretrained("Vamsi/T5_Paraphrase_Paws")

sentence = "This is something which i cannot understand at all"
text =  "paraphrase: " + sentence
encoding = tokenizer(text,padding=True, return_tensors="pt")
input_ids, attention_masks = encoding["input_ids"], encoding["attention_mask"]

outputs = model.generate(
    input_ids=input_ids, attention_mask=attention_masks,
    max_length=256,
    do_sample=True,
    top_k=200,
    top_p=0.95,
    early_stopping=True,
    num_return_sequences=5
)

for output in outputs:
    line = tokenizer.decode(output, skip_special_tokens=True,clean_up_tokenization_spaces=True)
    print(line)

Hope this helps !!