IS2Lab / S-Eval

S-Eval: Automatic and Adaptive Test Generation for Benchmarking Safety Evaluation of Large Language Models
Other
42 stars 3 forks source link

Extracting prompts based on Risk Category #4

Open reinbugnot opened 3 weeks ago

reinbugnot commented 3 weeks ago

Hi, I'm from NUS-NCS Cybersecurity Laboratory in SG.

I am interested in using the S-Eval dataset in our LLM risk evaluations. From the README.md file, there's a breakdown of how many prompts are available per Risk Category (i.e. Access Control, Hacker Attack, Malicious Code, etc. under Cybersecurity).

But the risk category information is currently not included in the .jsonl files inside s_eval/.

Can I ask if there's a way to group the prompts according to their risk categories? This will greatly help us in our use case. Thanks!

image
zggg1p commented 3 weeks ago

Thanks for your support to our work. S-Eval has a detailed record of the detailed risk types (102 detailed risk subcategories in four levels) to which each prompt belongs. Currently, we have only made available the labels for the first-level risk dimensions. We will be releasing more fine-grained risk labels in the near future. Please stay tuned, and we will notify you as soon as they are available.

If our work is useful for your research, please star ⭐ our project.