Open dankim444 opened 3 months ago
Great. Can we switch to 4o for everything? (or have you already)
cleaned_statements_en.csv still needs to be translated into 9 new languages. This would require translating 12141 characters. It will cost approximately $0.18 to complete these translations.
Description
This PR introduces a new statement normalization pipeline, cleans the remaining original statements in the raw_statements directory, and introduces minor changes to different files to streamline text extraction (specifically extracting the language code) from filenames. The criteria for normalization is as follows:
The normalization pipeline leverages OpenAI, and the news_statements and observable statements were cleaned using gpt-4o while email_statements (due to the size of the files) files were cleaned with gpt-3.5-turbo. During this process, I noticed several differences in performance between the two models. Specifically, gpt-4o was more consistent in not changing the original capitalization of proper nouns, altering the original vocabulary, and not introducing any additional punctuation; whereas gpt-3.5-turbo would make changes despite being explicitly instructed not to in the system prompt. When merged, this PR will close https://github.com/Watts-Lab/commonsense-platform/issues/150, ensuring consistent rendering of statements on the commonsense platform's UI.
New files
Changes
Testing
I acted as a "human-in-the-loop" to verify OpenAI's outputs. I used an online Diffchecker tool (https://www.diffchecker.com/) to compare changes made from the original file to the new file. I also used OpenAI playground to verify the system prompt.
Important note
To ensure more consistent output from OpenAI, I recommend using gpt-4o or possibly gpt-4o-mini to normalize the statements. In particular, gpt-3.5-turbo would sometimes remove the capitalization of proper nouns, alter some vocabulary and thereby change the nuanced meaning of some statements, and introduce unintended punctuation. I directly address all these in the system prompt; however, it is open to improvement.