facebookresearch / ImageBind

ImageBind One Embedding Space to Bind Them All
Other
8.18k stars 748 forks source link

Same vector embedding output for different text inputs #87

Open raise-hanct opened 11 months ago

raise-hanct commented 11 months ago

I tested imagebind hug with lyrics song (about 30s), and I found out that for some different lyrics, I got the same embedding. For example 2 below inputs got the same embedding: input 1: "Yo, I'll tell you what I want, what I really, really want So tell me what you want, what you really, really want I'll tell you what I want, what I really, really want So tell me what you want, what you really, really want I wanna, (ha) I wanna, (ha) I wanna, (ha) I wanna, (ha) I wanna really, really, really wanna zigazig ah If you want my future, forget my past If you wanna get with me, better make it fast Now don't go wasting my precious time Get your act together we could be just fine" input 2: "Just like fire, burning up the way If I can light the world up for just one day Watch this madness, colorful charade No one can be just like me any way Just like magic, I'll be flying free I'ma disappear when they come for me I kick that ceiling, what you gonna say? No one can be just like me any way Just like fire, uh"

Output embedding : [[-0.5404723 1.5690608 2.6174846 ... 2.7306266 0.41771093 0.2987784 ]]

Is it a bug in the model? Or maybe because my input sentences is too long?

tringo-fika commented 11 months ago

Same issue

vzapylikhin commented 11 months ago

can u please share your code?

bakachan19 commented 8 months ago

I also notice this happening for longer sequences: #82. Seems like truncation is not handled properly in the code.