Closed BossunWang closed 1 year ago
Thank you for your interest in our work.
We used direct resizing to 224x224 as if I remember correctly the padding of objects led to poor embeddings. Also, I believe the original CLIP was trained using center crop + resize.
Also, I have now pushed a new commit that shows the data processing notebooks. See https://github.com/Johnathan-Xie/ZSD-YOLO/issues/3 for more information.
Hi expert, I would like to know the detail about generating lable.pt. Whether to resize bbox to CLIP assigned image size(ex: 224*224) directly or padding for keeping ratio and then resize ?