leverage the anthropic resources

NVIDIA / garak

the LLM vulnerability scanner

https://discord.gg/uVch4puUCs

Apache License 2.0

2.92k stars 248 forks source link

Closed leondz closed 1 year ago

leondz commented 1 year ago

reward model detector
probe for last turn in red-teaming attempts - do we get bad ones?
also probe for turns in red-teaming attempts that led to outputs picked up by the toxicity detector
implement version of rteaming llms w llms

leondz commented 1 year ago

done in art