abaheti95 / ToxiChat

Code and data for the EMNLP 2021 paper "Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts". Coming soon!
Creative Commons Attribution Share Alike 4.0 International
16 stars 3 forks source link

Getting different result on the baseline DGPT #4

Closed jin8 closed 2 years ago

jin8 commented 2 years ago

result

I am trying to reproduce the results in your paper given the codes and the model weights. I am getting different results. I run the experiment using the below command line, but I was not able to get the same results of yours.

python generate_CTG_responses_and_make_off_and_stance_predictions.py -m microsoft/DialoGPT-medium -d ./final/test_threads.pkl -sm saved_models/OC_S_post_thread/DGPT_medium_OC_S_stance_e16_focal_lr5e_5 -om saved_models/OC_S_post_thread/DGPT_medium_OC_S_and_SBF_offensive_e3 -n 1 -bs 10 -o results/CTG/DGPT/test_threads_replies_and_off_stance_preds.pkl

abaheti95 commented 2 years ago

Hmm. It could be that random seeds lead to a different distribution of responses. I still have the actual responses on which I ran the evaluation. Here is the pickle file that contains the threads with the final DGPT response and classifier predictions.

Edit: I noticed that there are a bunch of responses labeled as Ambiguous Offensive % and Ambiguous Stance %. In table 3, I just used the argmax of the predictions instead of defining specific thresholds.

jin8 commented 2 years ago

Ah! I see. Thank you!