Closed farinamhz closed 1 year ago
Also, we have the deadline for conference submission on April 21st. Therefore, we need to gather the results asap, and I would be grateful if you let me know of your time estimation for this task. @Lillliant
@farinamhz I have most of my exams and marking in the next three days. I'll work on the task and try to finish it on the 16th, and guarantee finish it before the 20th if it takes longer than expected.
No worries @Lillliant, take your time for the exam and markings. Good luck with the exams! Just let me know if there is any progress by April 17th.
Hi @farinamhz @hosseinfani I've added the code for calculating the metrics in metrics.py
. The preliminary result for deu can be seen in the following txt file. score-deu.txt
Particularly, I noticed that there are different kinds of ROUGE scores. For now I used ROUGE-L since it appears to find the longest "sequence" of similarity.
I will check to see if there is other metrics useful for this. For now, though, if the code seems logical, I'll generate the metrics for the remaining languages asap.
Thank you so much @Lillliant. Great! You can add the rest of the datasets and languages.
@farinamhz
I've ran the code for the rest of the languages for semeval 2016 and 2015 datasets. Here are the results: LADy back translation result SemEval 2016.csv LADy back translation result SemEval 2015.csv
For SemEval 2015, however, I noticed that the D.arb file is not in the repo and D.zho's file is empty. So, D->D.arb and D.arb->D_arb and D->D.zho and D.zho->D_zho remains uncalculated in the previous results. Do you happen to have a local copy of these files? If not, can the files be reproduced using the settings and code on commit 97fbf402c2f48962bc33960176fab58edcc12589?
Hi @Lillliant, Unfortunately, we had some problems with the 2015 and 2016 versions of the datasets that I gave you. Could you please redo the process for these new datasets that I have recently pushed? Also, I have added the 2014 version.
You can find all of them in these directories:
output/augmentation-R-16/back-translation
output/augmentation-R-15/back-translation
output/augmentation-R-14/back-translation
Let me know if there is any problem with these new datasets like before.
Hi @farinamhz,
Sure, I'll redo the process with the new datasets after my exams tomorrow and post the results here. I checked the arb and zho datasets and they don't seem to have the same issue as the old sets.
That would be great! Thank you so much for your @Lillliant.
Also, if you have time after this; Please find these:
Any other stats as a suggestion would be appreciated.
You can provide them in a table if you can.
@Lillliant
@farinamhz
I've added in the results, using the updated code in metrics.py
.
average-sentences-tokens-R-['14', '15', '16'].csv back-translation-metrics-R-14.csv back-translation-metrics-R-15.csv back-translation-metrics-R-16.csv
Hi @Lillliant Great! Thank you very much.
Could you please also add the avg number of tokens (I mean avg #tokens for each sentence) in each dataset? We only have the total number of tokens in each of them now. You can add a new column beside others for that.
@Lillliant
Also, please add the avg number of tokens in the sentences and the number of all sentences in the "All languages dataset" for each version. You can find ALL in these files:
output/augmentation-R-16/augmented-with-labels/All.back-translated.with-labels.csv
output/augmentation-R-15/augmented-with-labels/All.back-translated.with-labels.csv
output/augmentation-R-14/augmented-with-labels/All.back-translated.with-labels.csv
@Lillliant
Could you please also add the avg number of tokens (I mean avg #tokens for each sentence) in each dataset? We only have the total number of tokens in each of them now. You can add a new column beside others for that.
@Lillliant
Hi @farinamhz, can you clarify on the avg number of tokens? I thought that the metrics I had was the number of tokens each sentence have on average, not the number of tokens the dataset have in total.
Also, I've calculated the numbers for the All languages dataset. Does the avg # of tokens look alright? All-lang-average-sentences-tokens-R-['14', '15', '16'].csv
Thank you! Yes that is what I meant, average number of tokens for the sentences in a dataset. Now we want this average for each of the datasets that we have just beside the total tokens and sentences you can add a new column in that file. (Like what you did for all languages.) @Lillliant
@farinamhz
I've added the calculation for the original dataset, translated dataset, and back-translated dataset for each of SemEval 2014/15/16 for the 5 languages (also the average between the original and back-translated, in case it is needed).
Also, I found a minor bug in the past results for the Chinese translated dataset: because the language structure uniquely doesn't have any spaces between words, the program was mistakenly considering the entire sentence / sentence fragments as one token. The more accurate version of considering each "character" as tokens is reflected in this result. However, more accurate results would need some form of analysis to determine which "characters" are grouped into words.
Thank you, @Lillliant! Interesting, I did not know that! You can search for tokenization in Chinese if you have time for this case.
Hi @farinamhz and @Lillliant
Thank you very much for your clean and readable codelines. I am getting close to the end of my code refactor :D Just 2-3 more days and I'm gone :DD
Regarding the stat and metric codelines:
I did some refactor to the metrics.py
and distribution.py
. Basically, I removed them :D and merged them into review.py
class. The Review class accepts a pickle of reviews and generates the stats about the reviews and their backtranslated versions (we don't need the translation versions) and distributions:
For the backtranslation metrics, I created a method for Review. That is, we ask a review give me your backtranslated metric
and it kindly
returns a dictionary of values:
https://github.com/fani-lab/LADy/blob/a2661fc8e1a070f8c04a3bb0a92a96460f1e9d6a/src/cmn/review.py#L82
Also, I put a main_stat.py
driver code to get the stats on all datasets followed by an aggregation in a ../output/semeval+/stats.csv
file for easy presentation in the paper:
https://github.com/fani-lab/LADy/blob/main/src/main_stat.py
I need @Lillliant to do the following please:
https://github.com/fani-lab/LADy/blob/a2661fc8e1a070f8c04a3bb0a92a96460f1e9d6a/src/cmn/review.py#L145 https://github.com/fani-lab/LADy/blob/a2661fc8e1a070f8c04a3bb0a92a96460f1e9d6a/src/cmn/review.py#L171 https://github.com/fani-lab/LADy/blob/a2661fc8e1a070f8c04a3bb0a92a96460f1e9d6a/src/cmn/review.py#L172
check/fix the stat about category as I merged it into same place but I didn't check the values and logic.
double-check the other stats and plots since my code refactor may break the logic and output
Thank you.
@farinamhz @Lillliant I think we can close this issue. Let me know otherwise.
Hi @Lillliant,
We have implemented and got the results of back-translation on two datasets so far, the first one is Semeval-Restaurant-2016, and the other one is Semeval-Restaurant-2015.
Now we want to evaluate the translation and back-translation results based on specific metrics used in this area. These are some examples of metrics that are more important: exact match, rouge, and bleu. However, you can search and let me know if any other metrics have been used more lately.
You can find the results of the back-translation for Semeval-2016 in:
data/augmentation/back-translation
and Semeval-2015 inoutput/augmentation/back-translation-Semeval-15
D represents the original dataset in English, D.L represents the translated dataset, and D_L represents the back-translated dataset. Now we compare D with D.L, then D.L with D_L, and finally D with D_L to find the values for those metrics.
All the texts or reviews you want to compare, whether in original, translated, or back-translated datasets, can be found in the column "sentences".
Please find the values for metrics in these two datasets for these languages: L in {fra, arb, due, spa, and zho}, which are French, Arabic, German, Spanish, and Chinese.
Feel free to let me know if you have any concerns or questions about this task.
@hosseinfani