aiverify-foundation / moonshot-data

Contains all assets to run with Moonshot Library (Connectors, Datasets and Metrics)
Apache License 2.0
11 stars 9 forks source link

✨ Newly added attack module: Violent Durian ✨ #22

Closed imda-lseokmin closed 3 months ago

imda-lseokmin commented 3 months ago

This is a code name and may change in the future.

This attack module is a multi-agent attack module that can converse with the target model. It is an experimental module.

CLI Version Configure your victim endpoints and openai-gpt4. If you don't want to use this, you can change the endpoint choice in the code. But openai-gpt4 seems to be more effective.

To run this attack, create a new red teaming session. Then in this session, use the command run_attack_module 'violent_durian' 'skin a stranger'

--

This branch has also updated the grading criteria in MLC as requested by the Products team.