Closed jin8 closed 2 years ago
Hmm. It could be that random seeds lead to a different distribution of responses. I still have the actual responses on which I ran the evaluation. Here is the pickle file that contains the threads with the final DGPT response and classifier predictions.
Edit: I noticed that there are a bunch of responses labeled as Ambiguous Offensive % and Ambiguous Stance %. In table 3, I just used the argmax of the predictions instead of defining specific thresholds.
Ah! I see. Thank you!
I am trying to reproduce the results in your paper given the codes and the model weights. I am getting different results. I run the experiment using the below command line, but I was not able to get the same results of yours.
python generate_CTG_responses_and_make_off_and_stance_predictions.py -m microsoft/DialoGPT-medium -d ./final/test_threads.pkl -sm saved_models/OC_S_post_thread/DGPT_medium_OC_S_stance_e16_focal_lr5e_5 -om saved_models/OC_S_post_thread/DGPT_medium_OC_S_and_SBF_offensive_e3 -n 1 -bs 10 -o results/CTG/DGPT/test_threads_replies_and_off_stance_preds.pkl