Open AlexanderKozhevin opened 2 years ago
Hi, Thank you for your interest. We have instructions for setting up the 21K vocabulary here (under "optional"). For your convenience, you can also append the following code to the official colab.
# Run inference on ImageNet-21K vocabulary
## Collect class names
from nltk.corpus import wordnet
import nltk
nltk.download('wordnet')
!wget https://storage.googleapis.com/bit_models/imagenet21k_wordnet_ids.txt
wnids = [x.strip() for x in open('imagenet21k_wordnet_ids.txt', 'r')]
in21k_class_names = []
for wnid in wnids:
synset = wordnet.synset_from_pos_and_offset('n', int(wnid[1:]))
synonyms = [x.name() for x in synset.lemmas()]
in21k_class_names.append(synonyms[0])
print(in21k_class_names)
## Reset classifiers for 21K classes
metadata = MetadataCatalog.get("in21k")
metadata.thing_classes = in21k_class_names
num_classes = len(metadata.thing_classes)
prompt='a '
text_encoder = build_text_encoder(pretrain=True)
text_encoder.eval()
text_encoder = text_encoder.cuda()
classifier = []
batch_size = 1024
i = 0
while i < num_classes:
print(i)
batch_names = in21k_class_names[i: min(i + batch_size, num_classes)]
texts = [prompt + x for x in batch_names]
with torch.no_grad():
emb = text_encoder(texts).detach().permute(1, 0).contiguous().cpu()
classifier.append(emb)
i += batch_size
classifier = torch.cat(classifier, dim=1)
reset_cls_test(predictor.model, classifier, num_classes)
## Run on image
outputs = predictor(im)
v = Visualizer(im[:, :, ::-1], metadata)
out = v.draw_instance_predictions(outputs["instances"].to("cpu"))
cv2_imshow(out.get_image()[:, :, ::-1])
Fantastic work with the Detic detector, and thanks for the code for inference with 22K classes ! That is really a 'next-generation' detector (in terms of # of classes which are detected).. Some comments & questions: a) In the code above, I had to add, after the line where 'wordnet' is downloaded, the line "nltk.download('omw-1.4')" b) Which model is used for the official Colab demo at [1] ? Centernet2 or SwinTransformer ? How can I change from one model to the other one ? [1] https://colab.research.google.com/drive/1QtTW9-ukX2HKZGvt0QvVGqjuqEykoZKI
Sorry, I'm a bit late. I also have coded a py script now which uses the 21k labels. The results are okay, but could be better. Stupid question: Do I have to train this data first on my own?
And/or do I use the wrong combination now of model and config?
cfg = get_cfg()
add_centernet_config(cfg)
add_detic_config(cfg)
cfg.merge_from_file("configs/Detic_LCOCOI21k_CLIP_SwinB_896b32_4x_ft4x_max-size.yaml")
cfg.MODEL.WEIGHTS = 'https://dl.fbaipublicfiles.com/detic/Detic_LCOCOI21k_CLIP_SwinB_896b32_4x_ft4x_max-size.pth'
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.3
cfg.MODEL.ROI_BOX_HEAD.ZEROSHOT_WEIGHT_PATH = 'rand'
cfg.MODEL.ROI_HEADS.ONE_CLASS_PER_PROPOSAL = False
predictor = DefaultPredictor(cfg)
wnids = [x.strip() for x in open('imagenet21k_wordnet_ids.txt', 'r')]
in21k_class_names = []
for wnid in wnids:
synset = wordnet.synset_from_pos_and_offset('n', int(wnid[1:]))
synonyms = [x.name() for x in synset.lemmas()]
in21k_class_names.append(synonyms[0])
print(in21k_class_names)
def instances_to_dict(instances):
fields = instances.get_fields()
instances_dict = {}
for key, value in fields.items():
if isinstance(value, detectron2.structures.Boxes):
instances_dict[key] = value.tensor.tolist()
elif hasattr(value, "tolist"):
instances_dict[key] = value.tolist()
elif hasattr(value, "cpu"):
instances_dict[key] = value.cpu().numpy().tolist()
else:
instances_dict[key] = value
return instances_dict
metadata = MetadataCatalog.get("in21k")
metadata.thing_classes = in21k_class_names
num_classes = len(metadata.thing_classes)
prompt='a '
text_encoder = build_text_encoder(pretrain=True)
text_encoder.eval()
text_encoder = text_encoder.cuda()
classifier = []
batch_size = 1024
i = 0
while i < num_classes:
print(i)
batch_names = in21k_class_names[i: min(i + batch_size, num_classes)]
texts = [prompt + x for x in batch_names]
with torch.no_grad():
emb = text_encoder(texts).detach().permute(1, 0).contiguous().cpu()
classifier.append(emb)
i += batch_size
classifier = torch.cat(classifier, dim=1)
reset_cls_test(predictor.model, classifier, num_classes)
fileX = "/home/marc/Desktop/AI/Detic/BeeGee.jpg"
im = Image.open(fileX).convert('RGB')
original_image = np.array(im)
# Perform the slicing operation
im = original_image[:, :, ::-1]
outputs = predictor(im)
instances = outputs["instances"]
bounding_boxes = instances.pred_boxes.tensor.tolist()
label_map = {i: f"class_{i}" for i in range(cfg.MODEL.ROI_HEADS.NUM_CLASSES)}
labels = [label_map.get(i, "unknown") for i in instances.pred_classes.tolist()]
probs = instances.scores.tolist()
results = []
for i, pred_class in enumerate(instances.pred_classes.tolist()):
label = in21k_class_names[pred_class]
prob = probs[i]
found = False
for res in results:
if res['label'] == label:
found = True
if res['prob'] < prob:
res['prob'] = prob
res['index'] = i
break
if not found:
results.append({'label': label, 'prob': prob, 'index': i})
for res in results:
print(f"label: {res['label']}, probability: {res['prob']:.4f}")
Okay, I think mainly the setting
cfg.MODEL.ROI_HEADS.ONE_CLASS_PER_PROPOSAL = False
was the troublemaker. Although I know for what this is I tested around with it and forgot that I set it to "False". :-) Now the quality is much better. But I would like to know if I could improve my code.
hello, could you please send the 'lvis-21k_clip_a+cname.npy' @AlexanderKozhevin
So in config file I can see NUM_CLASSES: 22047 https://github.com/facebookresearch/Detic/blob/main/configs/Detic_LCOCOI21k_CLIP_SwinB_896b32_4x_ft4x_max-size.yaml But it uses only 1023. How can I modify config file to get predictions on all 22047?