Open JCSTARS opened 5 days ago
Hi @JCSTARS, thank you for your valuable question.
In our experiment, we utilized a single RTX 2080 Ti GPU with 11GB of memory. The available memory proved sufficient to handle most of the datasets used in our study. For larger datasets, such as CIFAR-10 or more extensive datasets like ImageNet, a Divide-and-Conquer strategy can be employed, which involves clustering each block individually to generate temporary cluster centers, followed by a final clustering process on all temporary centers.
Thanks a lot for the answer, that's very helpful. The gpu that I'm using is RTX 3090 with 24GB. I'll find out what's the problem. There's just another problem, I noticed that based on your paper, pseudo labels can be obtained by calculating the cosine similarity between word embedding {wi} and reference word {zk}, and in your code, you calculate the pseudo label based on combined embedding {vi} and gpt embeddings{zk}, which is understandable. But in your code, you also set image encoder to be inside torch.no_grad, which makes me a little bit comfused, because you were supposed to finetune the projection layer of image encoder, right? Just correct me if I'm wrong. I would be very appreciated.
It seems that in Phase II of the learning, because of the massive images of the dataset, the cuda out of memory issue seems to be unavoidable, do you have any suggestions for that, or the impletations? Thanks so much.