Closed ngthanhtin closed 1 year ago
Hi @ngthanhtin,
Please check out finetuned CLIP model: ViT-B/16 and ViT-L/14. These models follow the extract same format as OpenAI's CLIP so that you can conduct an apple-to-apple comparison. You can easily use open_clip v1.3 to load the model weights.
self.clip_model, _, _ = open_clip.create_model_and_transforms('ViT-L-14', pretrained=ovsegclip_path)
Unfortunately, we don't have ViT-B/32 weights. You may want to use our open clip training to train your own CLIP ViT-B/32 with our mask-text pairs data.
Hi @all, Thank you for your great work. Your work has two implication 1) the segmentation model with Mask prompting, 2) the Embedding of your new CLIP model (that have a mask image and a mask prompt as inputs).
For now, I just found that you provided the weights for the entire model (segmentation model), so could you provide us with the new CLIP weights? I think it is really helpful because your work help CLIP learns the region-level image (as in RegionCLIP).
And do you plan to work on ViT-B/32? because I just found the ViT-B/16 and ViT-L/16 versions. Thank you very much.
Best, Tin