Closed linhuixiao closed 1 year ago
Your work is very similar to that of Pseudo-Q (https://github.com/LeapLabTHU/Pseudo-Q). What are the differences between your work and Pseudo-Q in the following aspects:
- What is the essential difference between the method of constructing pseudo labels of remote sensing data mentioned in your paper and Pseudo-Q;
- What is the difference between MLCF (Multi-level Cross-modal Fusion) mentioned in your paper and ML-CMA (Multi-level Cross-modal Attention) in Pseudo-Q?
Thanks
Regarding the difference in dataset construction. The Pseudo-Q deals with the generation of pseudo labels. While RSVG is to construct Remote Sensing Visual Grounding Dataset, which specifically involves the generation of textual descriptions of targets in remote sensing images. We design an automatic RS image-query generation method with manual assistance to construct real labels. Please read my paper (https://arxiv.org/abs/2210.12634) again for the specific method.
Although ML-CMA and MLCF are both called by the name of Multi-level Cross-modal, the starting point and algorithm are essentially very different.
Your work is very similar to that of Pseudo-Q (https://github.com/LeapLabTHU/Pseudo-Q). What are the differences between your work and Pseudo-Q in the following aspects:
- What is the essential difference between the method of constructing pseudo labels of remote sensing data mentioned in your paper and Pseudo-Q;
- What is the difference between MLCF (Multi-level Cross-modal Fusion) mentioned in your paper and ML-CMA (Multi-level Cross-modal Attention) in Pseudo-Q?
Thanks
Thanks a lot for your attention and kind reminder.
Although your work focuses on remote sensing data, the method you use is too similar to that of Pseudo-Q, and readers have reasons to think that your ideas are borrowed from Pseudo-Q. I suggest that you quote the work of Pseudo-Q in the final version of your paper and discuss the differences, otherwise it may be considered plagiarism. Kind regards.
Your work is very similar to that of Pseudo-Q (https://github.com/LeapLabTHU/Pseudo-Q). What are the differences between your work and Pseudo-Q in the following aspects:
What is the essential difference between the method of constructing pseudo labels of remote sensing data mentioned in your paper and Pseudo-Q;
What is the difference between MLCF (Multi-level Cross-modal Fusion) mentioned in your paper and ML-CMA (Multi-level Cross-modal Attention) in Pseudo-Q?
Thanks