facebookresearch / ov-seg

This is the official PyTorch implementation of the paper Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP.
Other
689 stars 61 forks source link

Code for generating training data from COCO Captions #7

Closed shehanmunasinghe closed 1 year ago

shehanmunasinghe commented 1 year ago

Could you please release the code for generating training data from COCO Captions and fine-tuning CLIP with the collected mask-category pairs?

Jeff-LiangF commented 1 year ago

Hi @shehanmunasinghe ,

We don't plan to release this part. However, we believe it is not hard to implement. Some hints: You may want to use the regions which stands for the generated proposals. For noun extraction, we use nltk. Some details that need to pay attention to: make sure the input to CLIP is normalized and the saved images should be unit8. Saved data should follow this format so it can be accepted in open clip training