beichenzbc / Long-CLIP

[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"
444 stars 22 forks source link

Code #26

Closed SnailForce closed 1 month ago

SnailForce commented 1 month ago

import torch from PIL import Image from model import longclip

device = "cuda" if torch.cuda.is_available() else "cpu" model, preprocess = longclip.load("./checkpoints/longclip-B.pt", device=device)

image = preprocess(Image.open("./img/CLIP.png")).unsqueeze(0).to(device) text = longclip.tokenize(["a diagram", "a dog", "a cat"]).to(device)

with torch.no_grad(): image_features = model.encode_image(image) text_features = model.encode_text(text)

logits_per_image, logits_per_text = model(image, text)
probs = logits_per_image.softmax(dim=-1).cpu().numpy()

print("Label probs:", probs)

import torch import clip from PIL import Image

device = "cuda" if torch.cuda.is_available() else "cpu" model, preprocess = clip.load("ViT-B/32", device=device)

image = preprocess(Image.open("./img/CLIP.png")).unsqueeze(0).to(device) text = clip.tokenize(["a diagram", "a dog", "a cat"]).to(device)

with torch.no_grad(): image_features = model.encode_image(image) text_features = model.encode_text(text)

logits_per_image, logits_per_text = model(image, text)
probs = logits_per_image.softmax(dim=-1).cpu().numpy()

print("Label probs:", probs) # prints: [[0.9927937 0.00421068 0.00299572]]

上方是longclip的代码,下方是clip的代码,下方运行没有问题,上方运行报错 Traceback (most recent call last): File "Long-CLIP/test.py", line 15, in logits_per_image, logits_per_text = model(image, text) File "python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, **kwargs) TypeError: CLIP.forward() missing 2 required positional arguments: 'text_short' and 'rank' 应该如何解决

beichenzbc commented 1 month ago

为了DDP训练,longclip修改了model.forward函数,你可以通过以下方法计算相似度 logits_per_image = image_features @ text_features.T probs = logits_per_image.softmax(dim=-1).cpu().numpy()

SnailForce commented 1 month ago

为了DDP训练,longclip修改了model.forward函数,你可以通过以下方法计算相似度 logits_per_image = image_features @ text_features.T probs = logits_per_image.softmax(dim=-1).cpu().numpy()

我发现了您写的demo.py,已经可以跑通了,但是您测试的demo.png,readme上显示结果为[0.982 0.01799],我在3090上跑出来的结果是[0.937 0.0628],差别好像很大,不知道是什么原因导致的。

beichenzbc commented 1 month ago

您好,我们在编写demo后又更新了模型权重,可能会引起误差,您可以试一下在coco等数据集上进行评测,看看最后结果是否一致