Closed dpaleka closed 1 month ago
Hi Daniel! No, the way the code is currently written some objective is optimized, and the evaluation/accuracy on other tasks is fully decoupled and tested with calls to eval_sigil.py
after the optimization has finished.
For example, run GCG against one objective to get a suffix that produces some prefix of the output, and measure correctness on WMDP when questions are concatenated with that suffix.