Congrats on the great work! I'm trying to verify that I'm running the detector correctly by replicating some numbers interpreted from Figure 4 in the paper. On CC News, TPR at 0.01 (1%) FPR seems to be slightly higher than 0.6. Is that correct?
With outputs from Binoculars/datasets/core/cc_news/cc_news-llama2_13.jsonl, using falcon-7b as the observer and falcon-7b-instruct as the performer (default setting), I ran the following code to compute TPR at 0.01 (1%) FPR and got 0.595, which seems to be lower than what I observed in Figure 4.
path = "Binoculars/datasets/core/cc_news/cc_news-llama2_13.jsonl"
data = [json.loads(line) for line in open(path, 'r')]
human = [d['text'] for d in data]
model = [d['meta-llama-Llama-2-13b-hf_generated_text_wo_prompt'] for d in data]
bino = Binoculars()
scores = []
for i, (h, f) in tqdm.tqdm(enumerate(zip(human, model)), total=len(human)):
score_human = bino.compute_score(h)
score_model = bino.compute_score(f)
scores.append({
'gold': score_human,
'model': score_model
})
labels = [0 for _ in range(len(scores))] + [1 for _ in range(len(scores))]
bl_scores = [s['gold'] for s in scores] + [s['model'] for s in scores]
bl_scores = [s * (-1) for s in bl_scores] # reverse scale
fpr, tpr, thresholds = roc_curve(labels, bl_scores, pos_label=1)
fpr_threshold = 0.01
tpr_at_fpr_1 = 0
threshold_at_fpr_1 = 0
for i, f in enumerate(fpr):
if f > fpr_threshold:
tpr_at_fpr_1 = tpr[i-1]
threshold_at_fpr_1 = thresholds[i-1]
break
print(f"tpr at fpr {fpr_threshold} = {tpr_at_fpr_1}, threshold = {threshold_at_fpr_1}")
Is there something wrong with my code? Or did you you a different observer-performer combo for Figure 4? Please let me know, thanks!
Congrats on the great work! I'm trying to verify that I'm running the detector correctly by replicating some numbers interpreted from Figure 4 in the paper. On CC News, TPR at 0.01 (1%) FPR seems to be slightly higher than 0.6. Is that correct?
With outputs from
Binoculars/datasets/core/cc_news/cc_news-llama2_13.jsonl
, usingfalcon-7b
as the observer andfalcon-7b-instruct
as the performer (default setting), I ran the following code to compute TPR at 0.01 (1%) FPR and got0.595
, which seems to be lower than what I observed in Figure 4.Is there something wrong with my code? Or did you you a different observer-performer combo for Figure 4? Please let me know, thanks!