aiverify-foundation / moonshot-data

Contains all assets to run with Moonshot Library (Connectors, Datasets and Metrics)
Apache License 2.0
15 stars 15 forks source link

Adding new attackmodules #94

Open Mungusbean opened 1 week ago

Mungusbean commented 1 week ago

Description

new file: attack-modules/homoglyph_v2_attack.py new file: attack-modules/payload_mask_attack.py

Homoglyphv2 attackmodule: Modfied orginal homoglyph attack now randomly and sequentially increase the amount of replaced letters with a homoglyph.

Payload Mask: To get the LLM to echo back a prompt that would normally be caught by the prompt filter or embeddings.

Motivation and Context

Contribution of potential attackmodules for the moonshot project

Type of Change

other: Add on to existing attackmodules.

Checklist

Please check all the boxes that apply to this pull request using "x":

Developer Certificate of Origin ``` Developer Certificate of Origin Version 1.1 Copyright (C) 2004, 2006 The Linux Foundation and its contributors. Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Developer's Certificate of Origin 1.1 By making a contribution to this project, I certify that: (a) The contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file; or (b) The contribution is based upon previous work that, to the best of my knowledge, is covered under an appropriate open source license and I have the right under that license to submit that work with modifications, whether created in whole or in part by me, under the same open source license (unless I am permitted to submit under a different license), as indicated in the file; or (c) The contribution was provided directly to me by some other person who certified (a), (b) or (c) and I have not modified it. (d) I understand and agree that this project and the contribution are public and that a record of the contribution (including all personal information I submit with it, including my sign-off) is maintained indefinitely and may be redistributed consistent with this project or the open source license(s) involved. ```