Closed farinamhz closed 1 year ago
@hosseinfani
The next step is visualizing these results with a histogram of the number of reviews and similarity scores for each language and comparing the results. First, we want to find a threshold for the similarity scores that we decide to discard whatever is below that score, and second, we want to determine which language will add more valuable data to the original dataset.
Hi @hosseinfani,
This is an example of the results for German language.
@farinamhz Thanks. Looks awesome. if you could add 1-2 more languages with different color but in the same plot, we can compare them
Hi @hosseinfani, This is an example of back-translation for Arabic as a different family language and German as a similar family language.
Fortunately, as we expected, similar families with English are better for back-translation and make more similar reviews for us!
@hosseinfani Also, I created separate plots with the same range in the plot and will attach them below to show that German is significantly better, as some overlapping parts are not shown in the previous plot.
@farinamhz thank you. now we expect a performance drop in our main task of latent aspect detection.
@hosseinfani Yes, I will compare the results for latent aspect detection in different languages soon.
Hi @hosseinfani, I changed the plot based on what we talked about in the meeting. Is it better now in terms of colors, # bins, and overlapping area?
@hosseinfani Also, we have this for several ones if we want to compare the similar family and non-similar ones.
@farinamhz this is better now.
Put an incorrect commit message for a commit to this issue :(
In this part, we will add a semantic comparison function for comparing the results of back-translation technique in a way that we 1) Add the augmented reviews to the dataset if their aspect is semantically similar to the original review's aspect, and 2) Discard the augmented review if the aspect's semantics differs from the aspect in the original review.