TransformersWsz / UMGF

Multi-modal Graph Fusion for Named Entity Recognition with Targeted Visual Guidance
65 stars 8 forks source link

Questions ablout the cropped pictures? #7

Open jim-11 opened 2 years ago

jim-11 commented 2 years ago

I found that only the cropped pictures of "PER" category have a certain meaning, while the pictures of other categories(e.g.: ORG) are usually not complete objects. I wonder if these low-quality pictures are helpful to the task.

crop_location_16_05_01_66

crop_location_16_05_01_66

crop_miscellaneous_16_05_01_66

crop_miscellaneous_16_05_01_66

crop_organization_16_05_01_66 crop_organization_16_05_01_66

crop_person_16_05_01_66 crop_person_16_05_01_66

TransformersWsz commented 2 years ago

We conduct one ablation study UMGF w/o Tar described in the paper, which demonstrates the necessity of targeted visual guidance. As you said, there may exist some low-quality pictures which is limited by the generalization of the onestage grounding toolkit. And we replace those categories that cannot be detected by the toolkit with the original whole image. More accurate toolkit may further improve the performance of the task.