Closed Doris-UESTC closed 4 months ago
Hi, thanks for asking, and sorry for the confusion.
It is just a way to ensure we initialize the template with exactly the length of prompt_len, because <start_of_text>
is always one token.
Also, for the input of the CLIP model, it is <start_of_text> + some ids + <end_of_text>
. Therefore, we want to add a dummy <start_of_text>
at the beginning and put the optimized prompts into dummy_ids[1:prompt_len+1]
.
Let me know if you have further questions!
that's will restrict the effcient prompt length,you know
Hi, I'm a bit confused by your question. Could you please elaborate a bit more?
Overall, the max input length of CLIP is 77 tokens. We need two of the tokens to be <start_of_text>
and <end_of_text>
. Therefore, the maximum length for prompt_len
can be set to 75 tokens.
I see the code of prompt_inversion_sd.ipynb.I have a question,why dummy_text is the '' * prompt_len ,then replace dummy_ids[1:prompt_len+1] with inputs_ids.I feel confused. Why do this?