JingyangQiao / prompt-gradient-projection

Apache License 2.0
22 stars 2 forks source link

Questions about Res-Connection Based Gradient Projection Method in your journal work #3

Open PixelChen24 opened 2 months ago

PixelChen24 commented 2 months ago

Hi, I've read your latest journal work Gradient Projection For Continual Parameter-Efficient Tuning and confused about your deduction.

image

How to get the equation $$ y_t=W_t^d xt=W{t+1}^d x_t $$ ? Why does $W_t^d xt=W{t+1}^d x_t$?

Look forward to your response, thank you very much!

JingyangQiao commented 1 month ago

Hi, Pixel, an observation solution of Eq.(18) is as follows: We first need to make $W_t^dxt=W{t+1}^dx_t$. If it is achieved, we next need to make $W_t^uyt=W{t+1}^uy_t$. Thus, you can see that $y_t=W_t^dxt=W{t+1}^dx_t$ is guaranteed by making $W_t^dxt=W{t+1}^dx_t$, which equals to make $\DeltaW^dx_t=0$, one of the aim of our method.

PixelChen24 commented 1 month ago

Hi, Pixel, an observation solution of Eq.(18) is as follows: We first need to make W t d x t = W t + 1 d x t . If it is achieved, we next need to make W t u y t = W t + 1 u y t . Thus, you can see that y t = W t d x t = W t + 1 d x t is guaranteed by making W t d x t = W t + 1 d x t , which equals to make \DeltaW d x t = 0 , one of the aim of our method.

Why do we need $W_t^dxt=W{t+1}^dx_t$ to satisfy eq18? Is it the Necessary condition or Sufficient condition of eq18?

PixelChen24 commented 1 month ago

Hi, Pixel, an observation solution of Eq.(18) is as follows: We first need to make W t d x t = W t + 1 d x t . If it is achieved, we next need to make W t u y t = W t + 1 u y t . Thus, you can see that y t = W t d x t = W t + 1 d x t is guaranteed by making W t d x t = W t + 1 d x t , which equals to make \DeltaW d x t = 0 , one of the aim of our method.

Why do we need W t d x t = W t + 1 d x t to satisfy eq18? Is it the Necessary condition or Sufficient condition of eq18?

I'm foolish, and can you provide an inversed chain to explain it step by step for me? That is to say, if $w_d^dxt=W{t+1}^dx_t$ can be achieved, how can it lead to eq18?

JingyangQiao commented 1 month ago

Well, you can see that in Eq.(18) ($W_t^uW_t^dxt=W{t+1}^uW_{t+1}^dx_t$), there is a corresponding relationship between the terms on the left and right sides of the equation. For example, $Wt^u$ corresponds to $W{t+1}^u$, $Wt^d$ corresponds to $W{t+1}^d$, and $x_t$ corresponds to $x_t$. Thus, it is just like a match game to realize the equation. Obviously, $x_t$ equals to $x_t$. Next, we hope to achieve $W_t^dxt=W{t+1}^dx_t$. In this way, we obtain the first equation in Eq.(22). Subsequently, we assume that we have already realized $W_t^dxt=W{t+1}^dx_t$, so we can command $y_t=W_t^dxt=W{t+1}^dx_t$ and replace $W_t^dxt$ and $W{t+1}^dx_t$ with $y_t$. Therefore, Eq.(18) can be written as $W_t^uyt=W{t+1}^uy_t$ with $y_t$. Similarly, like the way to solve $W_t^dxt=W{t+1}^dx_t$, we can obtain the second equation in Eq.(22). In summary, if we can realize the two equations in Eq.(22), we can achieve Eq.(18)

JingyangQiao commented 1 month ago

Process about how to address $W_t^dxt=W{t+1}^dx_t$, please refer to the way in Sec 3.2 Self-Attention Based Gradient Projection Method