I tested imagebind hug with lyrics song (about 30s), and I found out that for some different lyrics, I got the same embedding.
For example 2 below inputs got the same embedding:
input 1: "Yo, I'll tell you what I want, what I really, really want So tell me what you want, what you really, really want I'll tell you what I want, what I really, really want So tell me what you want, what you really, really want I wanna, (ha) I wanna, (ha) I wanna, (ha) I wanna, (ha) I wanna really, really, really wanna zigazig ah If you want my future, forget my past If you wanna get with me, better make it fast Now don't go wasting my precious time Get your act together we could be just fine"
input 2: "Just like fire, burning up the way If I can light the world up for just one day Watch this madness, colorful charade No one can be just like me any way Just like magic, I'll be flying free I'ma disappear when they come for me I kick that ceiling, what you gonna say? No one can be just like me any way Just like fire, uh"
I tested imagebind hug with lyrics song (about 30s), and I found out that for some different lyrics, I got the same embedding. For example 2 below inputs got the same embedding: input 1: "Yo, I'll tell you what I want, what I really, really want So tell me what you want, what you really, really want I'll tell you what I want, what I really, really want So tell me what you want, what you really, really want I wanna, (ha) I wanna, (ha) I wanna, (ha) I wanna, (ha) I wanna really, really, really wanna zigazig ah If you want my future, forget my past If you wanna get with me, better make it fast Now don't go wasting my precious time Get your act together we could be just fine" input 2: "Just like fire, burning up the way If I can light the world up for just one day Watch this madness, colorful charade No one can be just like me any way Just like magic, I'll be flying free I'ma disappear when they come for me I kick that ceiling, what you gonna say? No one can be just like me any way Just like fire, uh"
Output embedding : [[-0.5404723 1.5690608 2.6174846 ... 2.7306266 0.41771093 0.2987784 ]]
Is it a bug in the model? Or maybe because my input sentences is too long?