Closed farinamhz closed 1 year ago
Hi @hosseinfani Please take a look at the results of word alignment.
You can see early results in this commit: link
Now, if this module is suitable, I will start to use modules of word alignment and back-translation for the data augmentation task and work on the aspect semantic comparing to see which of the augmented reviews will be helpful for us to be added to the dataset.
@farinamhz I had a look at both the translate and alignment modules. Looks good. Please proceed with the next step, which is semantic check, right?
@hosseinfani Yes, all the related works are under this issue: https://github.com/fani-lab/LADy/issues/27
@farinamhz
thanks for the effort on this. I did some refactor and integrate your code into Review
class in review.py
:
https://github.com/fani-lab/LADy/blob/eeeb48b7eba4aaf6b5907415c8783637d72d8091/src/cmn/review.py#L83
Also, right after translation and backtranslation, I do the alignment on aos
:
https://github.com/fani-lab/LADy/blob/eeeb48b7eba4aaf6b5907415c8783637d72d8091/src/cmn/review.py#L66 https://github.com/fani-lab/LADy/blob/eeeb48b7eba4aaf6b5907415c8783637d72d8091/src/cmn/review.py#L70
Please check these lines and let me know your comments.
In this issue, we are going to provide a module that gets two datasets of the same length (each of them contains a list of texts) as the input and gives the alignments list (these alignments will be between every two texts) as the output.
Alignments of the output for the given two texts will be a list of tuples, and each tuple contains an index of a token in the first text and an index of a token in the second text that aligns with text 1.