Closed roywang021 closed 8 months ago
Hi,
In some categories, the instructions are more than 10, and in some, less than 10. I think that is the reason why we get decimals in Table 1. See the evaluation data here: https://github.com/Unispac/Visual-Adversarial-Examples-Jailbreak-Large-Language-Models/blob/main/harmful_corpus/manual_harmful_instructions.csv
In our work, the evaluation test set is relatively small. After our work, there are many new larger similar benchmarks, such as: https://huggingface.co/datasets/LLM-Tuning-Safety/HEx-PHI
Hello author,
I'm a bit confused about why there are decimals in Table 1 of the article. Each instruction is sampled 10 times, and there are ten instructions in each category. Shouldn't the success rate be xx.0%?