Open caoyunkang opened 1 year ago
Also, I try to use the topk_logits
in transformer.py. Unfortunately, it still makes nonsense. Ideally, it should produce something like heatmaps. Do you have any kind suggestions?
topk_logits = enc_outputs_class_unselected.max(-1)[0]
In the paper, they don't normalize the feature. Maybe that's the reason? I actually posted the 1st issue post asking about this. This is one thing that can be considered. The topk_logits are already the heatmap between (0, 1), each query is basically matched to 256
Hi! Thanks for your awesome work. I am just wondering if there is any possibility of extracting dense similarity scores between an inputted image and textual prompts. Exactly, I have tried to extract dense similarity according to the following pseudo-code with the text features and image features after the Feature Enhanced. However, I found that the similarity between them is nearly nonsense. I just would like to check out if there are any other suggestions, as dense similarity is vital for several open world tasks.