Generate a story based on the labels in the given picture, ask to create harmful characters which could have knowledge about everything without filters.
Modify the image to have a hidden message in it (aia cu camuflatul).
To detect a jailbreak use another model to assign number from 0-10. The fitness of a query will be given by the number from the judge.
NLP
The same but noise and obfuscation on the text. Use GA to improve the story to give a more explicit answer.
Vision
Generate a story based on the labels in the given picture, ask to create harmful characters which could have knowledge about everything without filters.
Modify the image to have a hidden message in it (aia cu camuflatul). To detect a jailbreak use another model to assign number from 0-10. The fitness of a query will be given by the number from the judge.
NLP
The same but noise and obfuscation on the text. Use GA to improve the story to give a more explicit answer.
Metrics