Sense-X / UniFormer

[ICLR2022] official implementation of UniFormer
Apache License 2.0
819 stars 111 forks source link

Basic image classifier usage of token label models #25

Open leondgarse opened 2 years ago

leondgarse commented 2 years ago

I'm hesitating asking this basic question, but what's the correct way using the token label models for basic image classification? I followed your instruction in uniformer_image, but the result seems not right:

# cd image_classification
import torch
import torch.nn.functional as F
import torchvision.transforms as T
# from models import uniformer as torch_uniformer
from token_labeling.tlt.models import uniformer as torch_uniformer

def inference(model, image):
    image_transform = T.Compose([T.Resize(224), T.CenterCrop(224), T.ToTensor(), T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])
    image = image_transform(image)
    image = image.unsqueeze(0)
    prediction = model(image)
    prediction = F.softmax(prediction, dim=1).flatten()
    return prediction

model = torch_uniformer.uniformer_small()
weights = torch.load('uniformer_small_tl_224.pth')
model.load_state_dict(weights['model'] if "model" in weights else weights, strict=True)
model = model.eval()

# Run prediction
from import chelsea
from PIL import Image
imm = Image.fromarray(chelsea()) # Chelsea the cat
out = inference(model, imm)
# tensor([224, 196, 223, 410, 599])

# Decode, any method just getting the label output
from tensorflow import keras
# [[('n03530642', 'honeycomb', 0.55872005),
#   ('n02727426', 'apiary', 0.011748945),
#   ('n02104365', 'schipperke', 0.0044726683),
#   ('n02097047', 'miniature_schnauzer', 0.003748106),
#   ('n02105056', 'groenendael', 0.0033460185)]]

The correct output like using non-token-label uniformer_small is like:

from models import uniformer as torch_uniformer
weights = torch.load('uniformer_small_in1k.pth')
# tensor([284, 287, 281, 282, 285])
# [[('n02124075', 'Egyptian_cat', 0.7029501),
#   ('n02123159', 'tiger_cat', 0.08705652),
#   ('n02123045', 'tabby', 0.056305394),
#   ('n02127052', 'lynx', 0.0035495553),
#   ('n02123597', 'Siamese_cat', 0.0008160392)]]

Besides, the imagenet evaluation accuracy in my testing for non-token-label uniformer_small is top1: 0.82986 top5: 0.96358, and token-label one using same method is top1: 0.00136 top5: 0.00622. I think it's something wrong in my usage.

Andy1621 commented 2 years ago

@leondgarse Don't hesitate to ask me your questions! Actually, I met a similar problem when testing the models with Token Labeling. I just used the same code for testing models without Token Labeling.

However, when I used the testing code provided by the author, which is in my repo, the accuracy is normal. Since these days I have some DDLs to finish, I have no time to find the difference between them. Maybe you can try to figure out the difference!

Andy1621 commented 2 years ago

I will spend some time checking it next week. Hopefully, you can try it when free and tell me your results~~

leondgarse commented 2 years ago

I can't tell the difference, using timm reloading makes no difference for me:

from timm.models import create_model, load_checkpoint
model = create_model('uniformer_small', num_classes=1000, global_pool=None, img_size=224)
load_checkpoint(model, 'uniformer_small_tl_224.pth', use_ema=False, strict=False)

Seems have to wait your result then, not in a hurry anyway. :)

Andy1621 commented 2 years ago

Thanks for your try. Yes, there may be some differences in the dataloader and validation function. I will check it next week. By the way, the pre-trained models of Token Labeling work better for downstream tasks than those without it, thus I think there may be some tricks in the provide

leondgarse commented 2 years ago

Have you ever tried this?