Open zhougr18 opened 1 year ago
Hi, thanks for your interest.
We haven't test text modality before. But in my perspective, this phenomennon perhaps because that, the training and learning of text modality, whcih is a highly abstraction of human knowledge, are quite different from more raw modality (e.g., audio and vision). So, modifying gradient of modality (like OGM do), maybe not quite usefull for text modality.
Hello, I'm trying to apply OGM-GE strategy to multimodal fusion network with text, video and audio modalities(e.g. MISA, MAG). However, when I use SGD optimizer, the model training process moves on with difficulty and finally achieves very low accuracy. Then I replace with Adam optimizer, but it seems that OGM_GE strategy doesn't work and the model training process is still dominated by text modality. Did these problems appear in your experiment? And how can I solve them? Looking forward to your reply.