PrithivirajDamodaran / Parrot_Paraphraser

A practical and feature-rich paraphrasing framework to augment human intents in text form to build robust NLU models for conversational engines. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration.
Apache License 2.0
867 stars 143 forks source link

Question about model training #2

Closed sajjjadayobi closed 3 years ago

sajjjadayobi commented 3 years ago

Hello, your work is wonderful, I'd like to create something like this in my native language (Persian). Could you please let me know how you trained those T5s? I have access to translated Quora question pairs, and I think the training process looks like the following

  1. filter similar sentences in the dataset
  2. train a text generation model from sentence 1 to sentence 2
  3. and from sentence 2 to sentence 1
  4. this model is a text2text generation I mean just training no include postprocessing is it correct or not?
PrithivirajDamodaran commented 3 years ago

Hello, your work is wonderful, I'd like to create something like this in my native language (Persian).

Could you please let me know how you trained those T5s?

I have access to translated Quora question pairs, and I think the training process looks like the following

  1. filter similar sentences in the dataset

  2. train a text generation model from sentence 1 to sentence 2

  3. and from sentence 2 to sentence 1

  4. this model is a text2text generation

I mean just training no include postprocessing

is it correct or not?