Closed dankim444 closed 4 months ago
All duplicates have been removed from all the translation files, and the structure across all the files in the raw_statements directory remains consistent. I added two new scripts: show_groups_of_duplicates.py and remove_duplicates.py. show_group_of_duplicates.py extracts all the duplicates into separate csv files for processing purposes in remove_duplicates.py. remove_duplicates.py handles the actual removal and runs a test at the end to ensure the number of lines is the same across all the files.
@amirrr Could I get a final review on this before I merge?
Description
This PR translates email_statements.csv, news_statements_amir.csv, and observable_gpt4o.csv in 9 new languages, contributing 27 new statement files in the raw_statements folder. This facilitates language support for statements as described in issue 145 of the commonsense-platform repository.
Changes
Additional Notes
Two programs have also been added to the .scripts folder: translate_statements_azure.py and translate_statements_aws. The translate_statements_azure.py script was used to translate news_statements_amir.csv and observable_gpt4o.csv, while translate_statements_aws.py was used to translate email_statements.csv.