fani-lab / LADy

LADy 💃: A Benchmark Toolkit for Latent Aspect Detection Enriched with Backtranslation Augmentation
Other
3 stars 3 forks source link

Semantic comparison of the augmented review with the original review #27

Closed farinamhz closed 1 year ago

farinamhz commented 1 year ago

In this part, we will add a semantic comparison function for comparing the results of back-translation technique in a way that we 1) Add the augmented reviews to the dataset if their aspect is semantically similar to the original review's aspect, and 2) Discard the augmented review if the aspect's semantics differs from the aspect in the original review.

farinamhz commented 1 year ago

@hosseinfani

The next step is visualizing these results with a histogram of the number of reviews and similarity scores for each language and comparing the results. First, we want to find a threshold for the similarity scores that we decide to discard whatever is below that score, and second, we want to determine which language will add more valuable data to the original dataset.

farinamhz commented 1 year ago

Hi @hosseinfani,

This is an example of the results for German language.

German_Histogram_Plot

hosseinfani commented 1 year ago

@farinamhz Thanks. Looks awesome. if you could add 1-2 more languages with different color but in the same plot, we can compare them

farinamhz commented 1 year ago

Hi @hosseinfani, This is an example of back-translation for Arabic as a different family language and German as a similar family language.

Fortunately, as we expected, similar families with English are better for back-translation and make more similar reviews for us!

German_Arabic_Histogram_Plot

farinamhz commented 1 year ago

@hosseinfani Also, I created separate plots with the same range in the plot and will attach them below to show that German is significantly better, as some overlapping parts are not shown in the previous plot.

two-langs

hosseinfani commented 1 year ago

@farinamhz thank you. now we expect a performance drop in our main task of latent aspect detection.

farinamhz commented 1 year ago

@hosseinfani Yes, I will compare the results for latent aspect detection in different languages soon.

farinamhz commented 1 year ago

Hi @hosseinfani, I changed the plot based on what we talked about in the meeting. Is it better now in terms of colors, # bins, and overlapping area?

French_Chinese_Histogram_Plot

farinamhz commented 1 year ago

@hosseinfani Also, we have this for several ones if we want to compare the similar family and non-similar ones.

French_Chinese_German_Arabic_Histogram_Plot

hosseinfani commented 1 year ago

@farinamhz this is better now.

hosseinfani commented 1 year ago

Put an incorrect commit message for a commit to this issue :(

17