Open shipengai opened 1 year ago
Hi Currently, we do not have plans for releasing such codes. We might release them after releasing training and evaluation code and more checkpoints. However, I can provide a basic overview of our pipeline.
To obtain the text similarity, you can use
test_categories = get_openseg_labels("coco_panoptic",prompt_engineered=False)
expression, positive_map_idx_token = create_queries_and_maps(test_categories,demo.predictor.tokenizer)
with torch.no_grad():
text_features = demo.predictor.model.forward_text([expression],'cuda')
text_feature_words = []
for k,v in positive_map_idx_token.items():
text_feature_words.append(text_features['hidden'][0,v,:].detach().cpu().mean(0))
text_feature_words = torch.stack(text_feature_words)
text_feature_words = torch.nn.functional.normalize(text_feature_words,dim=-1)
dist_text = torch.cdist(text_feature_words,text_feature_words) # 2 - 2 |A| |B|
dist_text = 0.5* (2.0 - dist_text)
Then you can visualize dist_text which has a shape N_CLS X N_CLS
The visual features are non-trivial and more complicated and it requires considerable hacking into the data loading and model inference process.
The first step is to sample N
annotations for each class, then for each image, the following code will extract the feature map of this image
batch = mapper(batch) # mapper is a DatasetMapper instance
samples = demo.predictor.model.preprocess_image([batch])
samples = nested_tensor_from_tensor_list(samples, size_divisibility=32)
with torch.no_grad():
features,_ = demo.predictor.model.detr.detr.backbone(samples)
img_features,mask = features[-1].decompose()
img_features = img_features.cpu() #1 X C X H X W
Then you want to get the ground truth mask and resize it to the same size as the feature map
msk = batch['pan_seg_gt'] == instance_id # H X W
mask_up = F.interpolate(msk.float()[None,None],img_features.shape[-2:],mode='area') #1 X C 1 X H X W
The final feature of this mask can be obtained through mask pooling
mask_up = mask_up / mask_up.sum()
out = torch.einsum('bchw,bdhw->bdc',img_features,mask_up)[0][0] # Final output of shape C, thi
Then you need to save out
for each selected annotation, average by class, and visualize.
Let me know if you have more questions.
Thanks for your reply!I will try it.
hello, Is there code to caculate the mean similarity which mentioned by this paper?