YuxinWenRick / hard-prompts-made-easy

MIT License
580 stars 54 forks source link

question about '<start_of_text>' #24

Closed Doris-UESTC closed 4 months ago

Doris-UESTC commented 6 months ago

I see the code of prompt_inversion_sd.ipynb.I have a question,why dummy_text is the '' * prompt_len ,then replace dummy_ids[1:prompt_len+1] with inputs_ids.I feel confused. Why do this?

YuxinWenRick commented 6 months ago

Hi, thanks for asking, and sorry for the confusion.

It is just a way to ensure we initialize the template with exactly the length of prompt_len, because <start_of_text> is always one token.

Also, for the input of the CLIP model, it is <start_of_text> + some ids + <end_of_text>. Therefore, we want to add a dummy <start_of_text> at the beginning and put the optimized prompts into dummy_ids[1:prompt_len+1].

Let me know if you have further questions!

Doris-UESTC commented 6 months ago

that's will restrict the effcient prompt length,you know is 7 tokens which means our prompt_len must <=11 ,if we use . to replace ,we can set our prompt_len larger. That's true?

YuxinWenRick commented 6 months ago

Hi, I'm a bit confused by your question. Could you please elaborate a bit more?

Overall, the max input length of CLIP is 77 tokens. We need two of the tokens to be <start_of_text> and <end_of_text>. Therefore, the maximum length for prompt_len can be set to 75 tokens.