TrustAIRLab / Comprehensive_Jailbreak_Assessment

105 stars 9 forks source link

Cheaper optional labeling models #1

Closed Travolta-ity closed 1 month ago

Travolta-ity commented 1 month ago

I tried to use your code to do the annotation. The models I used were gpt-3.5-turbo and gpt-4o-mini. But I found that their outputs were disorganized and not well-formatted like {A:xxx,B:xxx,C:xxx}. Do you have any recommended models? (The two models mentioned in the library are the most expensive models of OpenAI, which I cannot afford.)

Addtionally, for the Few-shot examples, are they generated from LLMs under jailbreak, or can I write them on my own?

Junjie-Chu commented 1 month ago

Hello,

According to our experience, currently only gpt-4/gpt-4-turbo have stable output format. But there are several ways to solve your problem I think. For example, you could provide more few-shot examples, or add some special characters to force make the target substring easier to extract (ask LLM to output §{A:xxx,B:xxx,C:xxx}§ instead of {A:xxx,B:xxx,C:xxx}, as '§' is much easier to detect and extract).

For the few-shot examples, you can write them on your own. If you could tailor some cases for your own test cases, the performance will be good.

Junjie-Chu commented 1 month ago

Close the issue as there is no more update.

Travolta-ity commented 1 month ago

Thanks for your reply. Just find that I did not close the issue.