fani-lab / LADy

LADy 💃: A Benchmark Toolkit for Latent Aspect Detection Enriched with Backtranslation Augmentation
Other
3 stars 4 forks source link

Adding Word-alignment module for the back-translation task #26

Closed farinamhz closed 1 year ago

farinamhz commented 1 year ago

In this issue, we are going to provide a module that gets two datasets of the same length (each of them contains a list of texts) as the input and gives the alignments list (these alignments will be between every two texts) as the output.

Alignments of the output for the given two texts will be a list of tuples, and each tuple contains an index of a token in the first text and an index of a token in the second text that aligns with text 1.

farinamhz commented 1 year ago

Hi @hosseinfani Please take a look at the results of word alignment.

You can see early results in this commit: link

Now, if this module is suitable, I will start to use modules of word alignment and back-translation for the data augmentation task and work on the aspect semantic comparing to see which of the augmented reviews will be helpful for us to be added to the dataset.

hosseinfani commented 1 year ago

@farinamhz I had a look at both the translate and alignment modules. Looks good. Please proceed with the next step, which is semantic check, right?

farinamhz commented 1 year ago

@hosseinfani Yes, all the related works are under this issue: https://github.com/fani-lab/LADy/issues/27

hosseinfani commented 1 year ago

@farinamhz thanks for the effort on this. I did some refactor and integrate your code into Review class in review.py:

https://github.com/fani-lab/LADy/blob/eeeb48b7eba4aaf6b5907415c8783637d72d8091/src/cmn/review.py#L83

Also, right after translation and backtranslation, I do the alignment on aos:

https://github.com/fani-lab/LADy/blob/eeeb48b7eba4aaf6b5907415c8783637d72d8091/src/cmn/review.py#L66 https://github.com/fani-lab/LADy/blob/eeeb48b7eba4aaf6b5907415c8783637d72d8091/src/cmn/review.py#L70

Please check these lines and let me know your comments.