eric-mitchell / detect-gpt

DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature
MIT License
349 stars 51 forks source link

Getting stuck when applying extracted fills #4

Closed sairights closed 1 year ago

sairights commented 1 year ago

I have encountered the following issue when I am processing my own text:

WARNING: 1 texts have no fills. Trying again [attempt 1].
WARNING: 1 texts have no fills. Trying again [attempt 2].
WARNING: 1 texts have no fills. Trying again [attempt 3].
WARNING: 1 texts have no fills. Trying again [attempt 4].
WARNING: 1 texts have no fills. Trying again [attempt 5].
WARNING: 1 texts have no fills. Trying again [attempt 6].
WARNING: 1 texts have no fills. Trying again [attempt 7].
WARNING: 1 texts have no fills. Trying again [attempt 8].
WARNING: 1 texts have no fills. Trying again [attempt 9].
...

As I set breakpoints and looked at the intermediate variables, I found that if len(fills) < n is activated in function apply_extracted_fills which results in an empty list output of the function. What might be the problem?

The text that I am dealing with is:

A Ponzi scheme is a type of investment scam where earlier investors are paid with the money of newer investors, rather than with actual profits earned. It's called a Ponzi scheme because it was named after Charles Ponzi, who became famous for using this technique in the early 1900s. Here's an example of how a Ponzi scheme might work: Imagine there are three people: Alice, Bob, and Carol. Alice is the person running the Ponzi scheme. Bob and Carol are the investors. Alice tells Bob and Carol that she has a special investment opportunity where they can earn a lot of money very quickly. Bob and Carol are excited and give Alice some of their money to invest. Alice takes the money from Bob and Carol and doesn't actually invest it anywhere. Instead, she uses some of the money to pay herself and keep some for herself. Then, she uses the rest of the money to pay Bob and Carol a small amount of money, pretending that it's the profits they've earned from the investment. Bob and Carol are happy because they're getting paid, so they tell their friends Dave and Emily about the investment opportunity. Dave and Emily also give Alice some of their money to invest. Alice uses the same process with Dave and Emily's money. She pays herself and keeps some for herself, and then uses the rest of the money to pay Bob, Carol, Dave, and Emily a little more money, pretending that it's the profits they've earned. This process continues, with Alice getting more and more money from new investors and using it to pay the earlier investors, who are happy because they think they're making a lot of money. However, the whole thing is a lie. Alice is not actually investing the money at all. She's just using the money from new investors to pay the earlier investors, and keeping some for herself. Eventually, the Ponzi scheme will collapse because there won't be enough new investors to pay all of the earlier investors, and people will start to realize that they're not actually making any real profits. A pyramid scheme is similar to a Ponzi scheme in that it's a type of investment scam. However, in a pyramid scheme, the people running the scam make their money by recruiting new members, rather than by investing the money of the members. Like a pyramid, the scheme relies on having a large number of people at the bottom to support the people at the top. Like a Ponzi scheme, a pyramid scheme will eventually collapse because there aren't enough new members to support the people at the top.

Thank you!

eric-mitchell commented 1 year ago

Sorry for the delayed response; it's been a crazy few weeks. This is probably happening because the text you've input is too long. When we mask + replace, the longer the text, the more mask tokens (e.g. <extra_token_16>) get put in the text. T5 has to keep track of all of these tokens and where they are, and if there are more than 10, sometimes we don't get back the right number of fills.

You can fix this problem by either using shorter sequences, or applying mask fills in multiple rounds (sampling only 5 masks + fills at a time and repeating until you've applied the desired number of masks to the text).

Hope this helps- I'll close for now, but feel free to re-open if you still have questions!