facebookresearch / Detic

Code release for "Detecting Twenty-thousand Classes using Image-level Supervision".
Apache License 2.0
1.88k stars 210 forks source link

How to modify config to use 22047 classes instead of 1023? #15

Open AlexanderKozhevin opened 2 years ago

AlexanderKozhevin commented 2 years ago

So in config file I can see NUM_CLASSES: 22047 https://github.com/facebookresearch/Detic/blob/main/configs/Detic_LCOCOI21k_CLIP_SwinB_896b32_4x_ft4x_max-size.yaml But it uses only 1023. How can I modify config file to get predictions on all 22047?

xingyizhou commented 2 years ago

Hi, Thank you for your interest. We have instructions for setting up the 21K vocabulary here (under "optional"). For your convenience, you can also append the following code to the official colab.

# Run inference on ImageNet-21K vocabulary
## Collect class names
from nltk.corpus import wordnet
import nltk
nltk.download('wordnet')

!wget https://storage.googleapis.com/bit_models/imagenet21k_wordnet_ids.txt
wnids = [x.strip() for x in open('imagenet21k_wordnet_ids.txt', 'r')]

in21k_class_names = []
for wnid in wnids:
    synset = wordnet.synset_from_pos_and_offset('n', int(wnid[1:]))
    synonyms = [x.name() for x in synset.lemmas()]
    in21k_class_names.append(synonyms[0])
print(in21k_class_names)

## Reset classifiers for 21K classes
metadata = MetadataCatalog.get("in21k")
metadata.thing_classes = in21k_class_names
num_classes = len(metadata.thing_classes)
prompt='a '

text_encoder = build_text_encoder(pretrain=True)
text_encoder.eval()
text_encoder = text_encoder.cuda()

classifier = []
batch_size = 1024
i = 0
while i < num_classes:
    print(i)
    batch_names = in21k_class_names[i: min(i + batch_size, num_classes)]
    texts = [prompt + x for x in batch_names]
    with torch.no_grad():
        emb = text_encoder(texts).detach().permute(1, 0).contiguous().cpu()
    classifier.append(emb)
    i += batch_size
classifier = torch.cat(classifier, dim=1)
reset_cls_test(predictor.model, classifier, num_classes)

## Run on image
outputs = predictor(im)
v = Visualizer(im[:, :, ::-1], metadata)
out = v.draw_instance_predictions(outputs["instances"].to("cpu"))
cv2_imshow(out.get_image()[:, :, ::-1])
hfassold commented 2 years ago

Fantastic work with the Detic detector, and thanks for the code for inference with 22K classes ! That is really a 'next-generation' detector (in terms of # of classes which are detected).. Some comments & questions: a) In the code above, I had to add, after the line where 'wordnet' is downloaded, the line "nltk.download('omw-1.4')" b) Which model is used for the official Colab demo at [1] ? Centernet2 or SwinTransformer ? How can I change from one model to the other one ? [1] https://colab.research.google.com/drive/1QtTW9-ukX2HKZGvt0QvVGqjuqEykoZKI

Marcophono2 commented 1 year ago

Sorry, I'm a bit late. I also have coded a py script now which uses the 21k labels. The results are okay, but could be better. Stupid question: Do I have to train this data first on my own?

And/or do I use the wrong combination now of model and config?

cfg = get_cfg()
add_centernet_config(cfg)
add_detic_config(cfg)
cfg.merge_from_file("configs/Detic_LCOCOI21k_CLIP_SwinB_896b32_4x_ft4x_max-size.yaml")
cfg.MODEL.WEIGHTS = 'https://dl.fbaipublicfiles.com/detic/Detic_LCOCOI21k_CLIP_SwinB_896b32_4x_ft4x_max-size.pth'
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.3  
cfg.MODEL.ROI_BOX_HEAD.ZEROSHOT_WEIGHT_PATH = 'rand'
cfg.MODEL.ROI_HEADS.ONE_CLASS_PER_PROPOSAL = False 
predictor = DefaultPredictor(cfg)

wnids = [x.strip() for x in open('imagenet21k_wordnet_ids.txt', 'r')]

in21k_class_names = []
for wnid in wnids:
    synset = wordnet.synset_from_pos_and_offset('n', int(wnid[1:]))
    synonyms = [x.name() for x in synset.lemmas()]
    in21k_class_names.append(synonyms[0])
print(in21k_class_names)

def instances_to_dict(instances):
    fields = instances.get_fields()
    instances_dict = {}

    for key, value in fields.items():
        if isinstance(value, detectron2.structures.Boxes):
            instances_dict[key] = value.tensor.tolist()
        elif hasattr(value, "tolist"):
            instances_dict[key] = value.tolist()
        elif hasattr(value, "cpu"):
            instances_dict[key] = value.cpu().numpy().tolist()
        else:
            instances_dict[key] = value

    return instances_dict

metadata = MetadataCatalog.get("in21k")
metadata.thing_classes = in21k_class_names
num_classes = len(metadata.thing_classes)
prompt='a '

text_encoder = build_text_encoder(pretrain=True)
text_encoder.eval()
text_encoder = text_encoder.cuda()

classifier = []
batch_size = 1024
i = 0
while i < num_classes:
    print(i)
    batch_names = in21k_class_names[i: min(i + batch_size, num_classes)]
    texts = [prompt + x for x in batch_names]
    with torch.no_grad():
        emb = text_encoder(texts).detach().permute(1, 0).contiguous().cpu()
    classifier.append(emb)
    i += batch_size
classifier = torch.cat(classifier, dim=1)
reset_cls_test(predictor.model, classifier, num_classes)

fileX = "/home/marc/Desktop/AI/Detic/BeeGee.jpg"

im = Image.open(fileX).convert('RGB')
original_image = np.array(im)

# Perform the slicing operation
im = original_image[:, :, ::-1]

outputs = predictor(im)
instances = outputs["instances"]

bounding_boxes = instances.pred_boxes.tensor.tolist()
label_map = {i: f"class_{i}" for i in range(cfg.MODEL.ROI_HEADS.NUM_CLASSES)}

labels = [label_map.get(i, "unknown") for i in instances.pred_classes.tolist()]

probs = instances.scores.tolist()
results = []
for i, pred_class in enumerate(instances.pred_classes.tolist()):
    label = in21k_class_names[pred_class]
    prob = probs[i]
    found = False
    for res in results:
        if res['label'] == label:
            found = True
            if res['prob'] < prob:
                res['prob'] = prob
                res['index'] = i
            break
    if not found:
        results.append({'label': label, 'prob': prob, 'index': i})

for res in results:
    print(f"label: {res['label']}, probability: {res['prob']:.4f}")
Marcophono2 commented 1 year ago

Okay, I think mainly the setting

cfg.MODEL.ROI_HEADS.ONE_CLASS_PER_PROPOSAL = False

was the troublemaker. Although I know for what this is I tested around with it and forgot that I set it to "False". :-) Now the quality is much better. But I would like to know if I could improve my code.

ysysys666 commented 3 months ago

hello, could you please send the 'lvis-21k_clip_a+cname.npy' @AlexanderKozhevin