Watts-Lab / commonsense-statements

Commonsense platfrom statements
https://watts-lab.github.io/commonsense-statements
0 stars 0 forks source link

Add translations for statements in 10 languages #15

Closed dankim444 closed 4 months ago

dankim444 commented 5 months ago

Description

This PR translates email_statements.csv, news_statements_amir.csv, and observable_gpt4o.csv in 9 new languages, contributing 27 new statement files in the raw_statements folder. This facilitates language support for statements as described in issue 145 of the commonsense-platform repository.

Changes

Additional Notes

Two programs have also been added to the .scripts folder: translate_statements_azure.py and translate_statements_aws. The translate_statements_azure.py script was used to translate news_statements_amir.csv and observable_gpt4o.csv, while translate_statements_aws.py was used to translate email_statements.csv.

dankim444 commented 4 months ago

All duplicates have been removed from all the translation files, and the structure across all the files in the raw_statements directory remains consistent. I added two new scripts: show_groups_of_duplicates.py and remove_duplicates.py. show_group_of_duplicates.py extracts all the duplicates into separate csv files for processing purposes in remove_duplicates.py. remove_duplicates.py handles the actual removal and runs a test at the end to ensure the number of lines is the same across all the files.

@amirrr Could I get a final review on this before I merge?