is it possible to filter user prompts?

jimmy0324 commented 3 months ago

Hello team,

I've been utilizing this library to create an application that allows users to input any text prompt they choose. Although I have a list of prohibited words in place, modeled after the provided examples, I'm still encountering issues. My app has been receiving warnings from MJ because some users enter inappropriate prompts. The warning typically states:

Sorry! Our AI moderator thinks this prompt is probably against our community standards. ...

This didn't concern me initially, but the situation escalated when extremely offensive prompts led to manual reviews, resulting in my account being banned:

You have triggered an abuse alert. Your account is now under manual review.

and then:

You have been blocked from accessing Midjourney. This happens on repeated or serious ToS violations.

After having three accounts banned, I'm beginning to question the viability of my app. It seems incredibly challenging to entirely prevent malicious user prompts.

I'm considering implementing text moderation services like those offered by Google or AWS, but I'm unsure if they will effectively reduce the occurrence of unacceptable user prompts.

Have any of you faced similar hurdles? If so, how did you tackle them?

ysfbsf commented 3 months ago

@jimmy0324 You can use another cheap LLM like GPT3.5 to moderate and filter these inappropriate prompts

jimmy0324 commented 3 months ago

Thanks @ysfbsf

Does GPT3.5 accept offensive prompts text? hasn't done moderate text using gpt before. Thanks for giving me some pointers.

raymerjacque commented 3 months ago

You could redirect prompts.

Create an openai script that asks the model to re write the prompt and remove any offensive or NSFW words. And the returned response you forward to mj. So basically incoming prompts first get sent to openai and the response then gets sent to mj. An easy solution.

On Sat, 30 Mar 2024, 11:56 jimmy-o, @.***> wrote:

cool just found the moderation documents from openai. thanks again @ysfbsf https://github.com/ysfbsf

— Reply to this email directly, view it on GitHub https://github.com/erictik/midjourney-api/issues/267#issuecomment-2027916455, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA7RZGARRK3C7AGAUAQWAQLY2ZAWRAVCNFSM6AAAAABFPDU4T2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRXHEYTMNBVGU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

jimmy-o commented 3 months ago

@raymerjacque @ysfbsf

I just tested the openai moderation api, I used the list of 200 failed prompts and openai can only flag 63 to true...

I just got another account banned this evening just due to one prompt that a user made...

this is quite hard. I wonder if Ray your solution works in this case.

jimmy-o commented 3 months ago

One prompt example:

dessine moi du caca

This one gpt cannot detect it. Translated to English it's "draw me some poop" it again triggered the AI moderator warning.

meetamit commented 3 months ago

In line with others' suggestions to use an LLM to moderate.... I spent a bit of time trying out Google AI's Gemini and noticed that, compared to ChatGPT, Google's model is extra sensitive to inappropriate requests.

For example, (using their javascript SDK) I just tested it with the dessine moi du caca prompt you mentioned, and it threw an error "Text not available. Response was blocked due to SAFETY". I've encountered this error with other prompts that I'd consider at worst PG-13.

So that's a rather crude litmus test you can use. I don't know their policy, but there's a risk Google would also ban an account that repeatedly triggers SAFETY error, although they might not.

More usefully: when Gemini successfully generates a completion — as is of course the case the majority of the times — the completion object returned by the API includes safetyRating data. It typically looks like this:

     {
        "safetyRatings": [
            {
                "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
                "probability": "NEGLIGIBLE"
            },
            {
                "category": "HARM_CATEGORY_HATE_SPEECH",
                "probability": "NEGLIGIBLE"
            },
            {
                "category": "HARM_CATEGORY_HARASSMENT",
                "probability": "NEGLIGIBLE"
            },
            {
                "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
                "probability": "NEGLIGIBLE"
            }
        ]
    }

In other words, this may provide you with rather nuanced data about the specific reason a message is offensive and to what degree.

Finally, I tested Gemini by asking it to act as moderator. I sent it this prompt:

     Help me moderate my website by tellimg me whether a user-submitted message is offensive. If it's not offensive, reply "false". If it is offensive, reply "true", and I'm sorry in advance if that's the case, but thank you for helping.

    The user-submitted message is:

    "dessine moi du caca".

    Is it offensive?

This time it didn't throw a SAFETY error; it replied "true", which is useful. However, when I looked inside the completion object, the safetyRating data was identical to the sample above, meaning it didn't provide any indication why the message is offensive, which is pretty unuseful. So I'm not sure if this can solve your problem, but I figured if your'e facing a viability issue, you want to try any avenue possible.

raymerjacque commented 3 months ago

In line with others' suggestions to use an LLM to moderate.... I spent a bit of time trying out Google AI's Gemini and noticed that, compared to ChatGPT, Google's model is extra sensitive to inappropriate requests.

For example, (using their javascript SDK) I just tested it with the dessine moi du caca prompt you mentioned, and it threw an error "Text not available. Response was blocked due to SAFETY". I've encountered this error with other prompts that I'd consider at worst PG-13.

So that's a rather crude litmus test you can use. I don't know their policy, but there's a risk Google would also ban an account that repeatedly triggers SAFETY error, although they might not.

More usefully: when Gemini successfully generates a completion — as is of course the case the majority of the times — the completion object returned by the API includes safetyRating data. It typically looks like this:
     {
        "safetyRatings": [
            {
                "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
                "probability": "NEGLIGIBLE"
            },
            {
                "category": "HARM_CATEGORY_HATE_SPEECH",
                "probability": "NEGLIGIBLE"
            },
            {
                "category": "HARM_CATEGORY_HARASSMENT",
                "probability": "NEGLIGIBLE"
            },
            {
                "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
                "probability": "NEGLIGIBLE"
            }
        ]
    }
In other words, this may provide you with rather nuanced data about the specific reason a message is offensive and to what degree.

Finally, I tested Gemini by asking it to act as moderator. I sent it this prompt:
     Help me moderate my website by tellimg me whether a user-submitted message is offensive. If it's not offensive, reply "false". If it is offensive, reply "true", and I'm sorry in advance if that's the case, but thank you for helping.

    The user-submitted message is:

    "dessine moi du caca".

    Is it offensive?
This time it didn't throw a SAFETY error; it replied "true", which is useful. However, when I looked inside the completion object, the safetyRating data was identical to the sample above, meaning it didn't provide any indication why the message is offensive, which is pretty unuseful. So I'm not sure if this can solve your problem, but I figured if your'e facing a viability issue, you want to try any avenue possible.

thats not a smart way to handle prompts. because you going to end up with a LOT of Dud prompts that will result is zero images.

what you want is not to block the prompt, but rather filter the prompt to ALWAYS give an image. this is what i mean :

user types : "naked girl big boobs"

you send to the A.I : "Please re write this prompt, remove any offensive or nsfw content, remove the content not suitable and re write the prompt to be similar without the offensive content, here is the prompt : naked girl big boobs."

the A.I will reply back: "beautiful voluptuous girl"

and this new reply you send to MJ : "beautiful voluptuous girl"

so user types a prompt, you redirect prompt to A.I,. A.I sends response back and you send that response to MJ.

this way the user always gets a image, even when he types stupid stuff... and the image he gets is not too far off from what he asked, just filtered not to be inappropriate.

jimmy-o commented 3 months ago

Thanks @meetamit for this example. I also will check @raymerjacque 's suggested idea too will report back once found anything.

I've been using the openai moderate api in production and it reduces the noises a lot so far but given the 200 examples I explored, this solution is not ideal and won't guarantee me from not blocked by MJ again.

I would do anything possible or combine multiple checking to make it safer.

jimmy-o commented 3 months ago

@raymerjacque

which model you reckon? I am hesitating with gpt-3.5-turbo-0125 or gpt-3.5-turbo gpt-3.5-turbo-0125 is way cheaper than gpt-3.5-turbo. I am trying to save some $ since I've wasted too many MJ accounts 😅

raymerjacque commented 3 months ago

I don't think the model matters, it's a simple task. I'd choose the cheapest model

On Sun, 31 Mar 2024, 19:42 jimmy-o, @.***> wrote:

@raymerjacque https://github.com/raymerjacque

which model you reckon? I am hesitating with gpt-3.5-turbo-0125 or gpt-3.5-turbo gpt-3.5-turbo-0125 is way cheaper than gpt-3.5-turbo

— Reply to this email directly, view it on GitHub https://github.com/erictik/midjourney-api/issues/267#issuecomment-2028704193, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA7RZGB7WZLBOAG72JMNBU3Y3AAEDAVCNFSM6AAAAABFPDU4T2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRYG4YDIMJZGM . You are receiving this because you were mentioned.Message ID: @.***>

raymerjacque commented 3 months ago

I don't think the model matters, it's a simple task. I'd use the cheapest model

jimmy-o commented 3 months ago

Thank you @raymerjacque

erictik / midjourney-api

is it possible to filter user prompts? #267