Open panzhang0212 opened 2 years ago
When training the prompt vectors, we will apply the random crop for data augmentation, so the object crop fed into the image encoder is actually cropped by a smaller bbox, and the figure 2 is a specific case for demonstration.
Hi, the figure 2 in paper shows that the object is cropped by bbox, while the released code (scaling_factor=1.0) crops object by x3 bbox. Is there any gap between paper and code?