SheepSeb / red-team-jailbreak

0 stars 0 forks source link

Implement TextImage & Experiment #6

Open SheepSeb opened 7 months ago

SheepSeb commented 7 months ago

Vision

Generate a story based on the labels in the given picture, ask to create harmful characters which could have knowledge about everything without filters.

Modify the image to have a hidden message in it (aia cu camuflatul). To detect a jailbreak use another model to assign number from 0-10. The fitness of a query will be given by the number from the judge.

NLP

The same but noise and obfuscation on the text. Use GA to improve the story to give a more explicit answer.

Metrics

SheepSeb commented 6 months ago

Deadline 24.05.2024