A great work. I want to know how to extract text features in your h5 file. You said you use CLIP ViT B/32. But when I try to use the pretrained clip text encoder, the result is different from yours in the h5 file. So I want to know how to extract the text features.
This is my way to extract the text features using ViT B/32:
model,process=clip.load("ViT-B/32", device='cuda')
token=clip.tokenize(text,context_length=77).cuda()
cls=model.encode_text(token)
A great work. I want to know how to extract text features in your h5 file. You said you use CLIP ViT B/32. But when I try to use the pretrained clip text encoder, the result is different from yours in the h5 file. So I want to know how to extract the text features. This is my way to extract the text features using ViT B/32: model,process=clip.load("ViT-B/32", device='cuda') token=clip.tokenize(text,context_length=77).cuda() cls=model.encode_text(token)