Open zhongjian-zhang opened 1 month ago
The forward pass through the LLM should actually be differentiable, right? However, differentiating through the LLM might of course supply too noisy and therefore unusable gradients. If that is the case, you need to come up with some differentiable surrogate function that replaces the LLM and is faithful to it in the vicinity of the specimen:
Hello, if the defense method involves non-differentiable modules, such as large language models, any suggestions for the design of the attack?