Closed msm8976 closed 4 months ago
Thank you for your attention! The results for the unimodal approach on the multimodal dataset were obtained as follows: each modality independently trains a network with a specific structure. During testing, the output features of the three networks are concatenated and then used for retrieval. For example, for PCB, we deploy three PCBs, each training on independent modality data. Finally, during testing, the outputs of the three networks are concatenated.
Thank you for your quick and detailed explanation.
In the experimental section of your paper, the results of single modality methods on multimodal datasets are reported. Could you please clarify how these single modality methods were specifically implemented? Were the features of multimodal data extracted separately using a single backbone, or did you only use the RGB modality, or were there other implementation methods? Looking forward to your response.