eric-mitchell / detect-gpt

DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature
MIT License
341 stars 53 forks source link

Changing the buffer size for no fills warnings in perturbation. #7

Closed ryuryukke closed 1 year ago

ryuryukke commented 1 year ago

Hi, Encountering lots of no fills warnings and being stuck in perturbation, can we change the buffer size from 1 to 2 or something? What does the buffer size mean? Thanks.

eric-mitchell commented 1 year ago

Thanks for checking out the repo!

The buffer size is the amount of text between fills (so you don't accidentally mask a long contiguous chunk of text). If you're getting lots of fill warnings, it's probably because your texts are long (or you're using a small T5 model, i.e. smaller than T5-3B, that isn't very good at tracking lots of mask tokens).

If you want to make long texts, I'd recommend doing your masking in multiple steps. If your text is long enough that you need to do e.g. 30 masks, you could try do the masking/filling process twice, applying 15 masks each time. Curious if that solves your problem- let me know/reopen the issue if you're still having problems!