Open jim-11 opened 2 years ago
We conduct one ablation study UMGF w/o Tar described in the paper, which demonstrates the necessity of targeted visual guidance. As you said, there may exist some low-quality pictures which is limited by the generalization of the onestage grounding toolkit. And we replace those categories that cannot be detected by the toolkit with the original whole image. More accurate toolkit may further improve the performance of the task.
I found that only the cropped pictures of "PER" category have a certain meaning, while the pictures of other categories(e.g.: ORG) are usually not complete objects. I wonder if these low-quality pictures are helpful to the task.
crop_location_16_05_01_66
crop_miscellaneous_16_05_01_66
crop_organization_16_05_01_66
crop_person_16_05_01_66