Open aa-dank opened 2 months ago
Also the Llama 2 data is missing twenty rows of prompts and data that do exist in the equivalent Vacuna coaid paraphrase data. Was that intentional?
Model-Attribution-in-Machine-Generated-Disinformation/data/filtered_llm/llama2_70b/coaid/synthetic-llama2_70b_coaid_paraphrase_generation_filtered.csv
Specifically this data:
Model-Attribution-in-Machine-Generated-Disinformation/data/filtered_llm/gpt-3.5-turbo/coaid/synthetic-gpt-3.5-turbo_coaid_paraphrase_generation_filtered.csv
The features of this dataset are...
'generation_approach', 'label', 'news_id', 'news_text', 'synthetic misinformation', 'theme'
whereas the coaid paraphrase datasets have these features:
'generation_approach', 'human', 'label', 'news_id', 'prompt', 'synthetic misinformation', 'theme sentence or passage'