Closed abhimanyu891998 closed 7 months ago
Sorry for the late reply, we have been training a stronger AUDIO model for the last few days and have now updated.
It seems you have a batch size of 150, which is too big for 8GB. You could try entering 2-4 samples at a time. If you want to compute the similarity matrix on 150 samples, then you should feed those samples into the model in batches and stack their feature at the end.
thank you! That helped, will try out the new audio model too!
Hi, Great work and thanks for open sourcing, I was trying your model on 150 video clips and audio clips, each clip is of length 5 seconds. Below is a screenshot of the code I am using. Here, the array,
video_clips
andaudio_files
are of size 150. During the embedding generation, the GPU consumes more than 8 GB of memory and the embedding generation stops. I tried the exact same sample with imageBind, but that seems to work fine during inference and embedding generation. Any idea if I am doing something wrong?