Closed shehanmunasinghe closed 1 year ago
Hi @shehanmunasinghe ,
We don't plan to release this part. However, we believe it is not hard to implement. Some hints: You may want to use the regions which stands for the generated proposals. For noun extraction, we use nltk. Some details that need to pay attention to: make sure the input to CLIP is normalized and the saved images should be unit8. Saved data should follow this format so it can be accepted in open clip training
Could you please release the code for generating training data from COCO Captions and fine-tuning CLIP with the collected mask-category pairs?