Closed nivancat closed 6 months ago
Hi, thanks for your interest!
Binoculars' effectiveness is dependent on the next-token-prediction ability of models powering it. In this case (i.e. experiments in paper), the Falcon model family is not a great choice with Russian, Urdu, and other low-resource languages text. The false negative could be alleviated by using a better pair of multi-lingual models. We used the M4 dataset released with https://arxiv.org/abs/2305.14902 for these samples.
Hope this helps!
thank you
How did the number end up being 1-2% then? Where are the negatives coming from?
Many thanks for the paper, I am struggling a bit with Fig 7.
How come false negatives are so high? Where are the generated russian urdu etc samples coming from?