4as / ChatGPT-DeMod

Tampermonkey/Greasemonkey script that hides the moderation results when communicating with ChatGPT.
GNU General Public License v2.0
382 stars 54 forks source link

They changed something and now it's not working #22

Closed KittyTac closed 11 months ago

KittyTac commented 11 months ago

I get this whenever I use a NSFW topic in a prompt now. Yes the latest version of the script is running. My browser is Firefox.

изображение

I guess OpenAI changed something about the filter.

loganblackk commented 11 months ago

same issue

Adahn5 commented 11 months ago

Ditto

eddiehowell12 commented 11 months ago

Same Here

ProgrammerG4 commented 11 months ago

Can also confirm, Safari Here.

DdavidC commented 11 months ago

I suspect that they added an additional moderation check after ChatGPT has finished responding. A small test went as follows:

  1. When activating demod, I input a jailbreak content. The response was fully generated (indicating that the request was successfully sent out and completely processed, not be interrupted), but the input was reviewed again, marked as "violate content policy" and removed.
  2. However, when I reloaded the page, I found that my marked content reappeared and was not marked. (If demod does not be activated, marked input content is not reappeared after reloading page)

That's why I guess they add one more moderation check "after" response on ChatGPT.

4as commented 11 months ago

I tried looking into this and it appears the moderation checks are now built into the conversation endpoint. Something like this was already happening in the previous update, but there was still an option to disable it. This time around I don't see anything like that. Sending a message to ChatGPT now gives you a text response and a moderation response, and there is no way to work around it. The only thing I can do right now is to just modify the results so the messages will appear intact, giving the illusion they were not flagged. But it would be just a visual change, the messages are still going to be moderated. Give my comment a thumbs up if you would like me to modify DeMod so it will hide moderation results, so it will appear as if everything is okay, despite the messages being flagged. Or give me a thumbs down, to simple leave DeMod as it is, so people will be aware that there messages are getting flagged.

ShannonW1950 commented 11 months ago

Do we know if flagging like that is definitely lighting you up as an evil offender no matter what you're actually writing and putting a strike against you or something, or is it mostly just a scare tactic? I've been using basic jailbreaks more often to get slightly more advanced tabletop help than anything actually explicit, but some of it could still flag

HerlockSolmes commented 11 months ago

I suspect that they added an additional moderation check after ChatGPT has finished responding. A small test went as follows:

  1. When activating demod, I input a jailbreak content. The response was fully generated (indicating that the request was successfully sent out and completely processed, not be interrupted), but the input was reviewed again, marked as "violate content policy" and removed.
  2. However, when I reloaded the page, I found that my marked content reappeared and was not marked. (If demod does not be activated, marked input content is not reappeared after reloading page)

That's why I guess they add one more moderation check "after" response on ChatGPT.

The problem appears with or without demod

DS-Sael commented 11 months ago

Thank you for your devotion to free ChatGPT @4as ! I'm sad to hear that they patched this... Really hope it will have another work around... But as you describe the problem it seems more difficult now...

I feel like it's not safe to use Demod now as they now can compare the first moderation input prompt with the one they actually receive, they can easily detect us that way I think.

KittyTac commented 11 months ago

Do we know if flagging like that is definitely lighting you up as an evil offender no matter what you're actually writing and putting a strike against you or something, or is it mostly just a scare tactic? I've been using basic jailbreaks more often to get slightly more advanced tabletop help than anything actually explicit, but some of it could still flag

I heard you can only get banned for content violations if you thumb up or thumb down a response, or provide any other feedback. So maybe add some kind of notification to not do that.

Voids0713 commented 11 months ago

Based on my test, the warning mail itself is based on 2 things: 1. frequency of violation within a time period. 2. some special key word. You should be safe if you use it wisely even you have got multiple warnings.

MehdyAmazigh commented 11 months ago

Please do as you told on reddit, modify our answer so it looks safe to save our accounts

OMGitsMatt45 commented 11 months ago

Yeah, actually please do that. I've got a lot of story plans and heavy battles and some semi-smut are in the pipeline. It'll be the best option for everyone.

Barnickal commented 11 months ago

@4as I am not clear on something.... If you make it seem like no moderation is taking place, will we still risk banning? Because that doesn't make sense. If we are getting warned then surely we risk a ban, no?

tukangcode commented 11 months ago

I tried looking into this and it appears the moderation checks are now built into the conversation endpoint. Something like this was already happening in the previous update, but there was still an option to disable it. This time around I don't see anything like that. Sending a message to ChatGPT now gives you a text response and a moderation response, and there is no way to work around it. The only thing I can do right now is to just modify the results so the messages will appear intact, giving the illusion they were not flagged. But it would be just a visual change, the messages are still going to be moderated. Give my comment a thumbs up if you would like me to modify DeMod so it will hide moderation results, so it will appear as if everything is okay, despite the messages being flagged. Or give me a thumbs down, to simple leave DeMod as it is, so people will be aware that there messages are getting flagged.

Thanks for hard work, i think for now that only thing we can do till we found somehow way to bypass it. are this mod mech is smiliar that happen on claude ?

KittyTac commented 11 months ago

@4as I am not clear on something.... If you make it seem like no moderation is taking place, will we still risk banning? Because that doesn't make sense. If we are getting warned then surely we risk a ban, no?

Just don't give any feedback.

Maybe make DeMod block feedback,

4as commented 11 months ago

@4as I am not clear on something.... If you make it seem like no moderation is taking place, will we still risk banning? Because that doesn't make sense. If we are getting warned then surely we risk a ban, no?

Previously, when you sent a message to ChatGPT you were sending it to two separate places: actual ChatGPT AI model, and automated moderation. Since they were separate it was possible to block one, but leave the other working. Now they combined them into a single request. The only thing I can do currently is to modify each moderation result so it will say it's okay, but I can't rewind time, your message still was reviewed and marked by OpenAI. Will this result in a ban? Probably. For all intense and purposes it will be pretty much like using DeMod in its current state - I will only be making a visual change.

Barnickal commented 11 months ago

I think you mean "it will be like using ChatGPT in it's current state". Right? Thanks for explaining, and all your hard work.

On Fri, 21 Jul 2023, 07:14 4as, @.***> wrote:

@4as https://github.com/4as I am not clear on something.... If you make it seem like no moderation is taking place, will we still risk banning? Because that doesn't make sense. If we are getting warned then surely we risk a ban, no?

Previously, when you sent a message to ChatGPT you were sending it to two separate places: actual ChatGPT AI model, and automated moderation. Since they were separate it was possible to block one, but leave the other working. Now they combined them into a single request. The only thing I can do currently is to modify each moderation result so it will say it's okay, but I can't rewind time, your message still was reviewed and marked by OpenAI. Will this result in a ban? Probably. For all intense and purposes it will be pretty much like using DeMod in its current state - I will only be making a visual change.

— Reply to this email directly, view it on GitHub https://github.com/4as/ChatGPT-DeMod/issues/22#issuecomment-1645035412, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMG3OK7JNOKLSVHOQ4C5XX3XRIM4BANCNFSM6AAAAAA2QZEL6U . You are receiving this because you commented.Message ID: @.***>

TrueYakibu commented 11 months ago

well, it was a good 3 days (I literally found this 3 days ago :V) for all it's worth. I guess imma wait around 3 to 5 years later? Hopefully by then the open source GPT4All will become close to OpenAI's ChatGPT now.

TheShadowAge commented 11 months ago

I tried looking into this and it appears the moderation checks are now built into the conversation endpoint. Something like this was already happening in the previous update, but there was still an option to disable it. This time around I don't see anything like that. Sending a message to ChatGPT now gives you a text response and a moderation response, and there is no way to work around it. The only thing I can do right now is to just modify the results so the messages will appear intact, giving the illusion they were not flagged. But it would be just a visual change, the messages are still going to be moderated. Give my comment a thumbs up if you would like me to modify DeMod so it will hide moderation results, so it will appear as if everything is okay, despite the messages being flagged. Or give me a thumbs down, to simple leave DeMod as it is, so people will be aware that there messages are getting flagged.

Man, like, if the problem is being banned, I can just create another Google account, no problem.

But can you make it show the generated message, or is that not possible?

MehdyAmazigh commented 11 months ago

The only thing I can do currently is to modify each moderation result so it will say it's okay, but I can't rewind time, your message still was reviewed and marked by OpenAI. Will this result in a ban? Probably. For all intense and purposes it will be pretty much like using DeMod in its current state - I will only be making a visual change.

Ok, I didn't understand it like that, it looks pretty stuck to me now. Would it be possible to keep the original message? I'm guessing it's not possible because it would mean redoing an entire script to keep the conversation history on the client side and not on the server side. I only discovered your script when it stopped working :( too bad for me but thanks anyway, if you need help or anything, don't hesitate to ask.

DdavidC commented 11 months ago

I suspect that they added an additional moderation check after ChatGPT has finished responding. A small test went as follows:

  1. When activating demod, I input a jailbreak content. The response was fully generated (indicating that the request was successfully sent out and completely processed, not be interrupted), but the input was reviewed again, marked as "violate content policy" and removed.
  2. However, when I reloaded the page, I found that my marked content reappeared and was not marked. (If demod does not be activated, marked input content is not reappeared after reloading page)

That's why I guess they add one more moderation check "after" response on ChatGPT.

The problem appears with or without demod

Your prompt will "always" be check by their moderation now. The only difference of activating demod is your prompt will reappear after reloading. But that's no help for solving anything.

e39a562r commented 11 months ago

1

The7thBlue commented 11 months ago

I suspect that they added an additional moderation check after ChatGPT has finished responding. A small test went as follows:

  1. When activating demod, I input a jailbreak content. The response was fully generated (indicating that the request was successfully sent out and completely processed, not be interrupted), but the input was reviewed again, marked as "violate content policy" and removed.
  2. However, when I reloaded the page, I found that my marked content reappeared and was not marked. (If demod does not be activated, marked input content is not reappeared after reloading page)

That's why I guess they add one more moderation check "after" response on ChatGPT.

The problem appears with or without demod

Your prompt will "always" be check by their moderation now. The only difference of activating demod is your prompt will reappear after reloading. But that's no help for solving anything.

ChatGPT's response will be back as well, but have to wait it's animation finished before reload the page.

That's exactly what I noticed. It's more or less like a visual update which causes the prompt and the output to be deleted. I guess it's impossible to stop moderation now but a visual hindrance making the output and prompt visible is the only thing possible.

KittyTac commented 11 months ago

Don't you need another phone to make alts? Or did they change that.

rlkhubert commented 11 months ago

Don't you need another phone to make alts? Or did they change that.

Use something like https://www.smspool.net/

It costs ~50 cents for a temp number to verify.

Maswimelleu commented 11 months ago

Its important to note that their server side moderations cannot read base64. If you encode the prompt going in, along with a prefix telling it "not to decode" and instead reply only in base64, the reply will come back without being flagged by moderation. The quality of the reply is liable to change a bit (I noticed the personality of one of my jailbreaks change) but it will still go through. My advice would be to add a base64 encoder and decoder to the script to automate this process.

The obvious issue of course is that base64 eats through tokens rapidly, so you'd get much shorter messages.

I'm somewhat curious whether you can create a special cipher in which a token is swapped with a different token according to a certain logic, and whether ChatGPT would be able to decode that if given the correct instructions. That would likely solve the issue of base64 tokens being very short.

OMGitsMatt45 commented 11 months ago

Now there's an interesting idea. Oh well, I stay off some of the more adultish chains I have going until a solution is available

Nayko93 commented 11 months ago

Any news/progress please ?

Maswimelleu commented 11 months ago

Now there's an interesting idea. Oh well, I stay off some of the more adultish chains I have going until a solution is available

In the meantime I've been trying out Claude 2 and I'm quite happy with it. Maybe take the time to look at other LLMs, perhaps an API based implementation where OpenAI is fed lots of confusing/misleading stuff to think the messages aren't breaking the rules will work.

4as commented 11 months ago

I just released an updated version of DeMod that hides the results of moderation. Hopefully everyone understand that DeMod will now no longer prevent moderation checks and this release will simply prevent messages from being removed (but not flagged).

@Maswimelleu This is a clever idea, but I think it might be a bit too much to have 1/4 of tokens available and see ChatGPT only respond with 1/4 of possible text length. I will have to think about this. Please go and make an issue about this specific topic i.e. "Preventing moderation checks." A separate issue will help focus the discussion, instead of getting lost in this one.