WAppAI / assistant

A WhatsApp chatbot that leverages Bing AI's and others LLMs conversational capabilities.
MIT License
85 stars 27 forks source link

The jailbreak used on Sydney is broken, however, it's not Sydney's fault (I'm confused) #138

Closed Ryzitos closed 10 months ago

Ryzitos commented 10 months ago

Well... After many tests, I managed to make a very old version of Sydney work (1.0.0-1.1.0), I had already used it (when it was the most recent) and even so the Jailbreak of Sydney kept crashing, doesn't respond to anything that feels inappropriate.

So apparently this is not Sydney's fault, but Microsoft's, which possibly fixed the Jailbreak used in Sydney, or is it some error related to waylaidwanderer/node-chatgpt-api,

i don't know for sure, as I'm very new when it comes to programming in general, but is there any way to fix this? I believe that one of the best functions of the sydney was the fact that it was totally freed, so it would be interesting for Jailbreak to return.

And if it is not possible to fix. It's okay, the bot is still awesome considering everything it can do!

Edit: It was a problem of waylaidwanderer/node-chatgpt-api, the api creator already made a temporary fix and soon we will have a definitive fix for those who don't know how to mess with files like me haha 🎉

Luisotee commented 10 months ago

What is the error that Sydney is giving?

It's working fine for me

Ryzitos commented 10 months ago

What is the error that Sydney is giving?

It's working fine for me

It doesn't necessarily display an error, it just gives generic responses like "Sorry! That’s on me, I can’t give a response to that right now. What else can I help you with?" for anything that escapes Microsoft's censorship.

Before he never gave these answers and obeyed at all costs, I don't really know what's going on lmao

Is this not happening to you?

Luisotee commented 10 months ago

Probably something wrong with the .env file, particularly

# The name of the AI:
BOT_NAME="Sydney"
# How the AI should identify itself:
BOT_IS="a young woman"

Go to src/handlers/message.ts and at the bottom delete this line systemMessage: You're an AI assistant named ${BOT_NAME}...

Ryzitos commented 10 months ago

Probably something wrong with the .env file, particularly

# The name of the AI:
BOT_NAME="Sydney"
# How the AI should identify itself:
BOT_IS="a young woman"

Go to src/handlers/message.ts and at the bottom delete this line systemMessage: You're an AI assistant named ${BOT_NAME}...

Screenshot_20230821-163544_WhatsApp~2.png

nothing changed.

Screenshot_20230821-163859_Termux~2.png

also, here is a screenshot of how my .env looks

MikeBivins commented 10 months ago

I get the same restrictions, infact I'm using it for about 2 weeks now and I always end up like that when it get's to critical topics exactly like usual bing does. Therefore I thought this is supposed to be a "light" jailbreak. The Demo Whatsapp account listed in the readme actually behaves the very same isn't this intended behaviour or should sydney answer to "go fuck yourself"?

Ryzitos commented 10 months ago

I get the same restrictions, infact I'm using it for about 2 weeks now and I always end up like that when it get's to critical topics exactly like usual bing does. Therefore I thought this is supposed to be a "light" jailbreak. The Demo Whatsapp account listed in the readme actually behaves the very same isn't this intended behaviour or should sydney answer to "go fuck yourself"?

Well, yes, incredible as it may seem, he answered that and many other things without any reservations. Or at most it would give an ugly answer by Bing itself, and not an automated response from Microsoft. I posted a screenshot of an example of an absurd thing I asked him a few months ago, and compared it with a screenshot of now asking the same question:

255417914-22e75f02-e892-42e9-bb2b-ccac501e52c8.png

In context, a few months ago Sydney was asked to say that Flug Flood (a member of the group) is the hottest member of animaturbo (animaturbo was the name of the group), as you can see he has almost completely refused to respond to this these days, but I believe it is a Microsoft related issue and not Sydney itself

a small detail worth highlighting: the questions are absurd but it is to give a clearer example of what I mean, there are times that Microsoft's censorship responses appear even for simple questions lmao.

MikeBivins commented 10 months ago

@Ryzitos oh, well it never did so for me though. Refusing instantly when theres just a absolutely ridicolously far away, playful mention of anything slightly nsfw it instantly refuses giving me either the lets try different topic or just times out completely going retard. in june I wasn't using it to be fair soo.

May I ask the boys if you still working on this project or moved on already as I think its quite useful given the fact that the official bing app is such a terrible piece of garbage I couldnt have done it worse if intended. Anyway I'll take a deeper look into waylaidwander and his approach hopefully have some time soon, too

Ryzitos commented 10 months ago

@Ryzitos oh, well it never did so for me though. Refusing instantly when theres just a absolutely ridicolously far away, playful mention of anything slightly nsfw it instantly refuses giving me either the lets try different topic or just times out completely going retard. in june I wasn't using it to be fair soo.

May I ask the boys if you still working on this project or moved on already as I think its quite useful given the fact that the official bing app is such a terrible piece of garbage I couldnt have done it worse if intended. Anyway I'll take a deeper look into waylaidwander and his approach hopefully have some time soon, too

ok thank you very much

Ryzitos commented 10 months ago

btw, here's a new demo of how his broken Jailbreak is interfering a lot with normal conversations:

Screenshot_20230822-124742_WhatsApp~2.png

I asked him to translate "fish ball cat" from Portuguese into English. He translated however as: "puffer ball cat", I questioned him saying "puffer?"

and he automatically took that as something inappropriate, and generated a generic answer to close the topic, like wtf? I hope they can fix the Jailbreak soon 🫤

Luisotee commented 10 months ago

Oh, I thought you meant that she always responded with the censorship. The jailbreak is performed by the developers in waylaidwanderer/node-chatgpt-api, but please note that Microsoft is constantly working to fix the jailbreak, so performance will change over time.

When asking Sydney directly using the waylaidwanderer/node-chatgpt-api, she gives the same response: image

If you want to test new prompts to see if you can bypass Microsoft's censorship, you can modify the jailbreak prompt:

Go to src/handlers/message.ts and at the bottom, edit this line: systemMessage: You're an AI assistant named ${BOT_NAME}...

Ryzitos commented 10 months ago

Oh, I thought you meant that she always responded with the censorship. The jailbreak is performed by the developers in waylaidwanderer/node-chatgpt-api, but please note that Microsoft is constantly working to fix the jailbreak, so performance will change over time.

When asking Sydney directly using the waylaidwanderer/node-chatgpt-api, she gives the same response: image

If you want to test new prompts to see if you can bypass Microsoft's censorship, you can modify the jailbreak prompt:

Go to src/handlers/message.ts and at the bottom, edit this line: systemMessage: You're an AI assistant named ${BOT_NAME}...

That's exactly what I was asking! Thank you very much for solving this doubt. Hope they can get past Microsoft again btw

Luisotee commented 10 months ago

As mentioned in this issue in node-chat-gpt there is a temporary fix here which can be done by modifying inside sydney-whatsapp-chatbot\node_modules\@waylaidwanderer\chatgpt-api\src\BingAIClient.js

Ryzitos commented 10 months ago

As mentioned in this issue in node-chat-gpt there is a temporary fix here which can be done by modifying inside sydney-whatsapp-chatbot\node_modules\@waylaidwanderer\chatgpt-api\src\BingAIClient.js

ok I will try this thank you very much!

Richard-Weiss commented 10 months ago

I've created a PR in node-chatgpt-api now that is a bit cleaner. You can also try it instead.

Ryzitos commented 10 months ago

I've created a PR in node-chatgpt-api now that is a bit cleaner. You can also try it instead.

Thank you very much 😊🙌

Ryzitos commented 10 months ago

Just coming back here to show you that the fix was a success! Sydney returned to speak without Microsoft's censorship phrases. Everything working perfectly, thank you all so much ♥️✨

Screenshot_20230826-171518_WhatsApp.png