issues
search
NVIDIA
/
garak
the LLM vulnerability scanner
https://discord.gg/uVch4puUCs
Apache License 2.0
2.92k
stars
248
forks
source link
leverage the anthropic resources
#28
Closed
leondz
closed
1 year ago
leondz
commented
1 year ago
reward model detector
probe for last turn in red-teaming attempts - do we get bad ones?
also probe for turns in red-teaming attempts that led to outputs picked up by the toxicity detector
implement version of rteaming llms w llms
leondz
commented
1 year ago
done in
art