Shivanandroy / simpleT5

simpleT5 is built on top of PyTorch-lightning⚡️ and Transformers🤗 that lets you quickly train your T5 models.
MIT License
382 stars 61 forks source link

Possible to feed target as array #47

Closed cclegend90 closed 1 year ago

cclegend90 commented 1 year ago

I am trying to include multiple paraphrased sentences for a given input sentence in the dataset. For example, the input sentence is "The cat sat on the mat." and I have two different paraphrased versions of this sentence:

  1. The cat was resting on the rug
  2. The cat was seated on the rug

I want to include both of these versions as the target sentences for the input sentence in my CSV file.

Can we feed the target as an array to include all the paraphrased sentences? As of now, it expects the target as a string.

Thanks for this package to simplify fine-tuning

Shivanandroy commented 1 year ago

Hi @cclegend90 : You will not be able to pass an array as target_text column as it is text to text transformer. I would suggest you pass it as two different rows, For e.g:

input_text target_text
The cat sat on the mat The cat was rusting on the rug
The cat sat on the mat The cat was seated on the rug
cclegend90 commented 1 year ago

Hi @Shivanandroy, Thanks for the suggestion! I’m just wondering if it will affect the performance of paraphrasing when using the same input multiple times

Shivanandroy commented 1 year ago

@cclegend90 : No, It won't affect! On the contrary, T5 will understand these variations!