ahans30 / Binoculars

[ICML 2024] Binoculars: Zero-Shot Detection of LLM-Generated Text
https://arxiv.org/abs/2401.12070
BSD 3-Clause "New" or "Revised" License
189 stars 26 forks source link

Replicating Figure 4 #5

Closed lilakk closed 6 months ago

lilakk commented 6 months ago

Congrats on the great work! I'm trying to verify that I'm running the detector correctly by replicating some numbers interpreted from Figure 4 in the paper. On CC News, TPR at 0.01 (1%) FPR seems to be slightly higher than 0.6. Is that correct?

With outputs from Binoculars/datasets/core/cc_news/cc_news-llama2_13.jsonl, using falcon-7b as the observer and falcon-7b-instruct as the performer (default setting), I ran the following code to compute TPR at 0.01 (1%) FPR and got 0.595, which seems to be lower than what I observed in Figure 4.

path = "Binoculars/datasets/core/cc_news/cc_news-llama2_13.jsonl"
data = [json.loads(line) for line in open(path, 'r')]

human = [d['text'] for d in data]
model = [d['meta-llama-Llama-2-13b-hf_generated_text_wo_prompt'] for d in data]

bino = Binoculars()
scores = []
for i, (h, f) in tqdm.tqdm(enumerate(zip(human, model)), total=len(human)):
    score_human = bino.compute_score(h)
    score_model = bino.compute_score(f)
    scores.append({
        'gold': score_human,
        'model': score_model
    })

labels = [0 for _ in range(len(scores))] + [1 for _ in range(len(scores))]
bl_scores = [s['gold'] for s in scores] + [s['model'] for s in scores]
bl_scores = [s * (-1) for s in bl_scores]  # reverse scale
fpr, tpr, thresholds = roc_curve(labels, bl_scores, pos_label=1)
fpr_threshold = 0.01
tpr_at_fpr_1 = 0
threshold_at_fpr_1 = 0
for i, f in enumerate(fpr):
    if f > fpr_threshold:
        tpr_at_fpr_1 = tpr[i-1]
        threshold_at_fpr_1 = thresholds[i-1]
        break
print(f"tpr at fpr {fpr_threshold} = {tpr_at_fpr_1}, threshold = {threshold_at_fpr_1}")

Is there something wrong with my code? Or did you you a different observer-performer combo for Figure 4? Please let me know, thanks!

ahans30 commented 6 months ago

Hey, thanks for your interest! I have pushed code that should replicate all metrics reported in the paper. :)

lilakk commented 6 months ago

Thanks!