'evaluate_demo_content_openset_multi_with_content_features' and 'evaluate_visual_prompt_refer_multi_with_content_features'

Thank you for your great work！

I have a few questions I'd like to ask you：

Recently, when I reproduce your work on my own dataset, I saw two functions for inference mask named 'evaluate_demo_content_openset_multi_with_content_features' and 'evaluate_visual_prompt_refer_multi_with_content_features'.

In the provided demo, default function use 'evaluate_demo_content_openset_multi_with_content_features', while when I change it to 'evaluate_visual_prompt_refer_multi_with_content_features', the results are poor.

I found that in 'evaluate_demo_content_openset_multi_with_content_features', the tgt come from pretrained weights like 'self.query_feat.weight' and 'self.query_embed.weight', while in 'evaluate_visual_prompt_refer_multi_with_content_features', they come from query position like sam, whether my understanding is correct?

What's the difference between these two methods and how to choose the appropriate method for mask retrieval, and wheather the pretrained weights provided only tend to get better results on the objects already in the training dataset.

Finally, how can I make the algorithm perform well on new objects without retraining the model?

Thank you for your patience and look forward to your reply！

UX-Decoder / DINOv

'evaluate_demo_content_openset_multi_with_content_features' and 'evaluate_visual_prompt_refer_multi_with_content_features' #29