Open Lincoln20030413 opened 3 weeks ago
I apologize for the confusion. But what I mean here is that for the normal prompt, there is no way to constrain it to control the background. So we use the normal prompt as the query of the cross attention to constrain it in combination with the features of depth anything. The normal prompt here is first generated by learnable parameters.
Thanks. Does it means that you initialize the general prompts and generate the updated general prompts using the initialized general prompts as query? I first think that this part needn't being trained and now maybe it also need being trained.
Yes, your understanding is correct, this part requires training cross attention.
Thanks for your patient replies!
Thanks for your fromer replies! I'm sorry but I'm a little confused about this part in your paper:
What's the query in the cross-attention when generating general prompts. The words "the general prompts form the queries" really confuse me a lot. Aren't we going to generate prompt?