Closed jainie-max closed 1 year ago
The fact that the model is locked only means the parameters of the model will not be updated during training. The gradient of the activation still exists as the input to the deepr layer requires gradient.
get it!Tanks
Hi, great work! I have some questions about clip aware and clip unaware in the paper. Since the clip model is locked, why the gradients can pass through clip in an e2e training manner and block in a two-stage manner training in paper fig.5?