Closed mikelee-dev closed 3 months ago
Hi, you may refer to this code:
from model import longclip
import torch
from PIL import Image
device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = longclip.load("./checkpoints/longclip-B.pt", device=device)
text = longclip.tokenize(["A man is crossing the street with a red car parked nearby.", "A man is driving a car in an urban scene."]).to(device)
image = preprocess(Image.open("./img/demo.png")).unsqueeze(0).to(device)
with torch.no_grad():
image_features = model.encode_image(image)
text_features = model.encode_text(text)
logits_per_image = image_features @ text_features.T
probs = logits_per_image.softmax(dim=-1).cpu().numpy()
print("Label probs:", probs)
Sorry, i meant training, not zero-shot inference
Got it. Than you may change the 'load_from_clip' function in train/train.py
into 'longclip.load', and rewrite the dataset loader.
Hi, I would like to continue your pre-training.
Is there a readme that describes how to load in your pre-trained Long-CLIP model checkpoint? Looking for a checkpoint from "ViT-L/14@336px".
Thanks!