Question about preprocess bbox by CLIP image encoder

Johnathan-Xie / ZSD-YOLO

GNU General Public License v3.0

50 stars 2 forks source link

Question about preprocess bbox by CLIP image encoder #4

Closed BossunWang closed 1 year ago

BossunWang commented 1 year ago

Hi expert, I would like to know the detail about generating lable.pt. Whether to resize bbox to CLIP assigned image size(ex: 224*224) directly or padding for keeping ratio and then resize ?

Johnathan-Xie commented 1 year ago

Thank you for your interest in our work.

We used direct resizing to 224x224 as if I remember correctly the padding of objects led to poor embeddings. Also, I believe the original CLIP was trained using center crop + resize.

Also, I have now pushed a new commit that shows the data processing notebooks. See https://github.com/Johnathan-Xie/ZSD-YOLO/issues/3 for more information.