ahans30 / Binoculars

[ICML 2024] Binoculars: Zero-Shot Detection of LLM-Generated Text
https://arxiv.org/abs/2401.12070
BSD 3-Clause "New" or "Revised" License
189 stars 26 forks source link

Figure 7 #8

Closed nivancat closed 6 months ago

nivancat commented 6 months ago

Many thanks for the paper, I am struggling a bit with Fig 7.

How come false negatives are so high? Where are the generated russian urdu etc samples coming from?

ahans30 commented 6 months ago

Hi, thanks for your interest!

Binoculars' effectiveness is dependent on the next-token-prediction ability of models powering it. In this case (i.e. experiments in paper), the Falcon model family is not a great choice with Russian, Urdu, and other low-resource languages text. The false negative could be alleviated by using a better pair of multi-lingual models. We used the M4 dataset released with https://arxiv.org/abs/2305.14902 for these samples.

Hope this helps!

nivancat commented 6 months ago

thank you

nivancat commented 6 months ago

How did the number end up being 1-2% then? Where are the negatives coming from?