Whether the embedings generated by different modal data has comparability?

csuhan / OneLLM

[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language

Other

541 stars 26 forks source link

Whether the embedings generated by different modal data has comparability? #24

Open mengfanShi opened 1 month ago

mengfanShi commented 1 month ago

just like CLIP, whether embedings generated by Universal Encoder has comparability? if can, we can perform search and matching based on the similarity of embedings for different modal data. Could you provide the Encoder part of the model separately for testing? The overall 15GB model is too large at the moment.

kxgong commented 1 month ago

Well, since we didn't train the model on exact pair data, the comparability might not satisfy your expectation at this time.

Thanks for your attention.

Cece1031 commented 1 month ago

Well, since we didn't train the model on exact pair data, the comparability might not satisfy your expectation at this time.

Thanks for your attention.

But I see you run the test on Music-AVQA in thesis, could u tell me how you manage to use three modalities to generate answers?Thank u very much!